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Karl John Hobinger 


The Psychometric Society suffered a great loss with the sudden death of 
one of its former presidents—Karl John Holzinger—who was stricken with a 
heart attack at his home on the morning of January 15, 1954, at the age of 61. 

Out of a deep personal sense of loss the writer was prompted to set down 
a few words about Mr. Holzinger. My first acquaintance with Professor 
Holzinger goes back 20 years, when I was a student in his statistics class. I had 
the feeling, undoubtedly shared by hundreds of his former students, that he 
was so sincere in trying to “get across” to the student and his manner so kind, 
humorous, and gentle, that each student made an extra effort to grasp what 
he was teaching. His pedagogical ability was supported by a sound scientific 
background and philosophy. He commanded my immediate respect and admir- 
ation. As our professional interactions increased and friendship developed 
during the next seven years of close working relationship, these early im- 
pressions were strengthened. His many wonderful traits as a teacher and friend 
appeared again and again, whether at serious seminars on factor analysis or, 
in lighter moments, on the tennis court or at the bridge table. 

Mr. Holzinger’s publications in statistics and statistical applications in 
education and psychology leave an enviable record for posterity. Although 
these publications are important contributions to scientific knowledge, it was 
his influence upon the many students and assistants which he considered his 
stake in the future. 

Mr. Holzinger spent almost his entire professional life on the campus of 
the University of Chicago, where he was in the Department of Education for 
32 years. He received the Ph.D. summa cum laude in mathematics and educa- 
tion from the University of Chicago in 1922, and shortly thereafter he went 
to the University of London to study statistical theory with Karl Pearson and 
statistical applications with Charles Spearman. This experience left its imprint 
on Professor Holzinger, and was reflected in his teaching, research, and writ- 
ings. The only other times that he was away from the University of Chicago 
was for service in the Psychological Corps of the United States Army during 
World War I and, in recent summers, as a visiting professor at the University 
of California at Berkeley. 

Professor Holzinger was recognized internationally for his contributions 
to the field of psychological statistics. He was honored professionally by being 
elected president of the Psychometric Society in 1940 and vice-president of 
the American Statistical Association in 1933; he was awarded the prize of the 
Education Research Association in 1941; and he served on the Editorial 
Board of Psychometrika from 1937 to 1948 and was joint editor of the Journal 
of Educational Psychology from 1949 to the time of his death. 


95 











96 PSYCHOMETRIKA 


While he is best known for his many contributions to factor analysis, 
dating back thirty years to his first paper with Spearman on “Sampling Errors 
of Tetrad Differences,” his research and applications of statistical theory span 
a much broader field. He is responsible for the statistical theory in two books: 
Twins (with Freeman and Newman) and Influence of Environment on Intelli- 
gence, Achievement, and Conduct (with Freeman). Of his many papers, mono- 
graphs, and books on factor analysis mention might be made of two principal 
works in this area: Factor Analysis (with Harman) and Preliminary Reports on 
Spearman-Holzinger Unitary Trait Study, Numbers 1-9 (in which his Bi- 
factor Theory is first presented). Finally, his writings included contributions 
to the pedagogical field, by way of textbooks and computational tables and 
aids. His Statistical Methods is well known, and we hope that a manuscript 
tentatively entitled, ‘Primer of Statistical Logic,” on which he was working 
at the time of his death, may soon be published. 

As regards Holzinger’s work in factor analysis, it must be said that, above 
all, he was recognized as the principal proponent of the Spearman Two-Factor 
Theory in this country. He admired Spearman’s monumental work in develop- 
ing a psychological theory involving a single general factor “g’” and a number 
of specific factors ‘‘s’’. But, he was even more impressed by the sound statis- 
tical model for this theory, and worked very closely with Spearman on these 
statistical problems. However, as time went on, he realized some of the inade- 
quacies of the Two-Factor Theory to deal with the complex psychological 
test batteries that came into being during the 1930’s. In developing a broader 
theory to cope with the greater demand, he nevertheless was guided by Spear- 
man’s earlier work, and in presenting his new Bi-factor Theory he claimed no 
more for it than an extension of Spearman’s work. It was that—but it was also 
an alternative multiple factor theory. Later he did considerable work on the 
comparison of various factorial solutions, especially the relationship between 
his Bi-factor Theory and Thurstone’s Multiple-Factor Theory. In the 1940’s 
he had swung even more in the direction of oblique multiple-factor analysis; 
and, under the characteristically unpretentious title, “The Simple Method of 
Factor Analysis,’ he was among the first to develop a multiple group method 
of analysis with simple computing procedures. 

Karl John Holzinger will long be remembered for his many contributions 
to factor analysis and related statistical work. Like his father before him, 
he was a great teacher with whom to study and a most stimulating scholar 
with whom to work, because he seemed to get such great enjoyment out of his 
teaching and research. He was a great, yet modest, man and while his later 
years were saddened by the untimely death of his son, his life was rounded out 
with happy family relationships, wide and friendly associations, and well 
deserved professional success. 

Harry H. HarMAan 


Pacific Palisades, California April 1954 
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MULTIVARIATE INFORMATION TRANSMISSION*+ 


WILLIAM J. McGILu 
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


A multivariate analysis based on transmitted information is presented. 
It is shown that sample transmitted information provides a simple method 
for measuring and testing association in multi-dimensional contingency 
tables. Relations with analysis of variance are pointed out, and statistical tests 
are described. 


Several recent articles in the psychological journals have shown how 
ideas derived from communication theory are being applied in psychology. 

It is not widely understood, however, that the tools made available by 
communication theory are useful for analyzing data whether or not we 
believe the human organism is best described as a communications system. 

This paper will present an extension of Shannon’s (10) measure of trans- 
mitted information. It will be shown that transmitted information leads 
to a simple multivariate analysis of contingency data, and to appropriate 
statistical tests. 


1. Basic Definitions 


Let us consider a communication channel and its input and output. 
Transmitted information measures the amount of association between the 
input and output of the channel. If input and output are perfectly correlated, 
all the input information is transmitted. On the other hand, if input and 
output are independent, no information is transmitted. Naturally most 
cases of information transmission are found between these extremes. There is 
some uncertainty at the receiver about what was sent. Some information is 
transmitted and some does not get through. 

We are interested not in what the transmitted information is, but in 
the amount of information transmitted. Suppose that we have a discrete 
input variable, x, and a discrete output variable, y. Since x is discrete, it 
takes on values or signals k = 1, 2, 3, --- , X with probabilities indicated 
by p(k). Similarly, y assumes values m = 1, 2, 3, --- , Y with probabilities 
p(m). If it happens that k is sent and m is received, we can speak of the 
joint input-output event (k,m). This joint event has probability p(k,m). 

*This work was supported in part by the Air Force Human Factors Operations 
Research Laboratories, and in part jointly by the Army, Navy, and Air Force under 
contract with the Massachusetts Institute of Technology. 

{Several of the indices and tests discussed in this paper have been developed in- 


dependently by J. E. Keith Smith (11) at the University of Michigan, and by W. R. Garner 
at Johns Hopkins University. 
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The rules governing the selection of signals at either end of the channel must 
be constructed so that 


Y vb) = D wm) = Y wlkym) = 1. 


Under these conditions, assuming successive signals are independent, the 
amount of information transmitted in “bits’’ per signal is defined as 
T(x;y) = H(x) + Hy) — H(z,y), (1) 


where 
H(x) = — > p(k) logs p(k), 


H(y) = — dX v(m) log, p(m), 
H(x,y) = — Do p(k,m) logs p(k,m). 


One “bit” is equal to —log, (4) and represents the information conveyed by 
a choice between two equally probable alternatives. Our development will use 
the bit as a unit, since this is the convention in information theory, but 
any convenient unit may be substituted by changing the base of the logarithm. 

If there is a relation between x and y, H(x) + H(y) > A(z,y) and 
the size of the inequality is just 7(x;y). On the other hand, if x and y are 
independent, H(x,y) = H(x) + H(y) and T(z;y) is zero. It can be shown 
that T(2x;y) is never negative. 

The presentation to this point has been an outline of the properties of 
the measure of transmitted information as set forth by Shannon (10). These 
properties may be summarized by stating that the amount of information 
transmitted is a bivariate, positive quantity that measures the association 
between input and output of a channel. There are, however, very few restric- 
tions on how a channel may be defined. The input-output relations that 
occur in many psychological contexts are certainly possible channels. Con- 
séquently we can measure transmitted information in these contexts and 
anticipate that the results will be interesting. 


2. Sample Information 
Our development will be based on sample measures of information, i.e., 
on measures of information constructed from relative frequencies. 
Suppose that we make n observations of events (k,m). We identify 
Nm a8 the number of times that k was sent and m was received. This means 


that 


Nn = > Nim ’ 
™m 

Rm = > Mn ’ 
k 

: 

k,m 


n 
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where 7, is the number of times that k was sent, n,, is the number of times 
that m was received, and n is the total number of observations. A particular 
experiment can then be represented by a contingency table with XY cells 
and entries Nim . 

We may estimate the probabilities, p(k), p(m), and p(k,m) with n,/n, 
Nn/n, and n,,,/n, respectively. Sample transmitted information, T’(zx;y), is 


defined as 
T’'(x;3y) = H’(x) + H’(y) — H'(x,y), (2) 


where H’(x), H’(y) and H’(z,y) are constructed from relative frequencies 
instead of from probabilities. [Throughout the paper a prime is used over a 
quantity to indicate the maximum likelihood estimator of the same quantity 
without the prime, e.g., T’(x;y) is an estimator for T'(x;y).] As before, T’(x;y) 
is the amount of transmitted information (in the sample) measured in “bits” 
per signal. 

Since it is difficult to manipulate logs of relative frequencies, we will 
introduce an easier notation: 


1 
Sim = n Ds Mim log. Nkm ’ bog a” on pon | A 
s&s, = pe Ny, log, Nk y , : . ° { 
; s ma 
font soft ole »4 w 
Sm = — >, Mm l0g2 Mm 5 | 
ee lA) ak, y bys ae Bay bu 
s = log, n. 


Expressions involving sample measures of information are easier to 
handle in this notation. For example, 7’(x;y) becomes 
T'(x3y) = 8 — & — 8m + Sim - (3) 


Equations (2) and (8) are equivalent expressions for 7’(x;y). When 
we write equations like (3), we shall say that these equations are written in 
s-notation. Thus (3) is (2) in s-notation. 


3. Three-Dimensional Transmitted Information 
Now let us extend the definition of transmitted information to include 
two sources, u and v, that transmit to y. To accomplish this we replace x 
in equation (2) with wu, v and we find that 
T'(uy;y) = H’(uy) + Hy) — H’(u»,y), (4) 


where x has been subdivided into two classes, u and v. The possible values 
of uarez = 1, 2,3, --- , U, while v assumes values; = 1, 2,3, --- , V. The 
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subdivision is arranged so that the range of values of u and v jointly constitute 
the possible values of x. This means that the input event, k, can be replaced 
by the joint input event (7,7). Consequently we have 


re = Ni; 5 
and the direct substitution of u,v for x in (2) is legitimate. 
Our new term, 7”(u,v;y), measures the amount of information trans- 


mitted when wu and v transmit to y. It is evident, however, that the direction 
of transmission is irrelevant, for examination of (4) reveals that 


T’(uv;y) = T’(yju,r). 


This means that nothing is gained formally by distinguishing transmitters 
from receivers. The amount of information transmitted is a measure of 
association between variables. It does not respect the direction in which 
the information is travelling. On the other hand, we cannot permute symbols 
at will, for 


T’(u,y 3) a- H’(u,y) + H'(v) a H'(u,v,y), 


and this is not necessarily equal to T’(u,v;y). 

Our aim now is to measure 7”’(u,v;y) and then to express T’(u,v;y) as 
a function of the bivariate transmissions between u and y, and » and y. 
Computation of 7’(u,v;y) is not difficult. Our observations of the joint 
event (7,j,m) organize themselves into a three-dimensional contingency table 
with UVY cells and entries n;;,, . We can compute the quantities in (4): from 
this table, or we can write 


T’(u,v;y) = 8 — 8m — 8:5 + Sim, (5) 
where 
Siim = : = Nism 102 Nijm » 
n t,7,m 
and the other s-terms are defined by analogy with the s-terms in equation (3). 
Now suppose we want to study transmission between u and y. We 
may eliminate v in two ways. First let us reduce the three-dimensional 
contingency table to two dimensions by summing over v. The entries in the 
reduced table are 


Nin = 2 Niim ° 
We have for the transmitted information between u and y, 


T’'(u;y) = 8 — 8& — 8m + Sim - (6) 


The second way to eliminate v is to compute the transmission between u and 
y separately for each value of v and then average these together. This trans- 
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mitted information will be called T/(u;y), where 


Tiwy) = Lo (Tiws)], | (7) 
and 7%(u;y) is information transmitted between wu and y for a single value 
of v, namely j. It is readily shown that 


Ti(usy) = 8; iad 83; -— Sim + Sijm . (8) 


We see that 7"(u;y) is written in the same way as 7’(u;y) except that the 
subscript j is added to each of the s-terms. 

There are three different pairs of variables in a three-dimensional con- 
tingency table. For example, the two equations for transmission between 
v and y are written 


T'(v;y) = 8 — 8; — 8m + Sim; (9) 

Try) = 8 — 83; — 8in + 85m - (10) 
Finally we may study transmission between wu and 2, i.e., 

T’'(uw) =s —s8; —8, + 8:3; , (11) 

Ti(ujv) = 8m — Sim — Sim tH Sijm + (12) 


With these results in mind let us reconsider the information transmitted 
between u and y. If v has an effect on transmission between u and y, then 
T!(u;y) ~ T’(u;y). One way to measure the size of the effect is by 


A’(wy) = Tiuiy) — Tuy), 
A'(uwy) = —8 + 8; +8; + 80 — 85; — Sim — Sim 1 Sim - (13) 
A few more substitutions will show that 
A'(wy) = Ti(usy) — T’(u;y), 
Tiv;y) — T’;y), (14) 
Ti(ujo) — T’(ujv). 


In view of this symmetry, we may call A’(uvy) the u-v-y interaction informa- 
tion. We see that A’(uvy) is the gain (or loss) in sample information trans- 
mitted between any two of the variables, due to additional knowledge of the 
third variable. 

Now we can express the three-dimensional information transmitted 
from u,v to y, i.e., T’(u,v;y), as a function of its bivariate components, for 


T’(uv3y) = T’(usy) + T’vsy) + A’wy), (15) 


T’(u,v3y) = Tiusy) + Tie;y) — A’(ury). (16) 
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Equations (15) and (16) taken together mean that T’(u,v;y) can be represented 
by a diagram with overlapping circles as shown in Figure 1. The diagram 
assumes what we shall call “positive” interaction between u,v and y. Inter- 


Ty (u;y) Tylvsy) 





T'(v3y) 


T'(uy;y) 


Ficure 1 

Schematic diagram of the components 
of three-dimensional transmitted in- 
formation. The diagram shows that 
three-dimensional transmission can be 
analyzed into a pair of bivariate trans- 
missions plus an interaction term. 
The meanings of the symbols are ex- 
plained in the text. 


action is positive when the effect of holding one of the interacting variables 
constant is to increase the amount of association between the other two. 
This means that 7%(u;y) > T’(u;y) and T%(v;y) > T’(v;y). [Because of (14), 
if one of these inequalities holds, both must hold.] Later on, however, we 
shall show that interaction may be negative. When this happens, relations 
between the interacting variables are reversed, and the diagram in Figure 1 
is no longer strictly correct. 


4. Components of Response Information 


The multivariate model of information transmission is useful to us 
because the situations treated by communication theory are not the same as 
those we deal with in psychological applications. The engineer is usually able 
to restrict himself to transmission from a single information source. He 
knows the statistical properties of the source, and when he speaks of noise he 
means random noise. This kind of precision is seldom available to us. In our 
experiments we generally do not know in advance how many sources are 
transmitting information. We must therefore be careful not to confuse 
statistical noise with the experimenter’s ignorance. 

The bivariate model of transmitted information provided by communi- 
cation theory tells us to attribute to random noise whatever uncertainty there 




















W. J. MCGILL 103 


is in specifying the response when the stimulus is known (1). Consequently, 
if several sources transmit information to responses, the bivariate model 
will certainly fail to discriminate effects due to uncontrolled sources from 
those due to random variability. On the other hand, the multivariate model 
can measure the effects due to the various transmitting sources. For example, 
in three-dimensional transmission we find that 


HA'(y) = Hi(y) + T’(ujy) + T’;y) + A’(wry), (17) 


where H’(y) = s — s,, and H/,(y) = 8:; — Siim- 

We see that H’(y), the response information, has been analyzed into 
an error term plus a set of correlation terms due to the input variables. The 
error term, H/,(y), is the residual or unexplained variability in the output, 
y, after the information due to the inputs, uw and v, has been removed. In 
bivariate information transmission, the response information is analyzed less 
precisely. For example, we may have 


H'(y) = Hity) + T’(usy). (18) 


In this case the error term is H/(y) because only one input, u, is recorded. 
Shannon (10) showed that 
Hity) 2 Hiy). 


In other words the error term, when only u is controlled, cannot be increased 
if we also control v. In fact 


Hily) = Hity) + Tiesy). (19) 


Equation (19) is proved by expanding both sides in s-notation. Thus 
if wu and v are stimulus variables that transmit information via responses, y, 
we have an error term, H/(y), provided we keep track of only one of the 
inputs, namely, wu. However, this error term contains a still smaller error 
term as well as the information transmitted from v. Controlling v is thus 
seen to be equivalent to extracting the association between v and y from the 
noise. Multivariate transmitted information is essentially information 
analyzed from the noise part of bivariate transmission. 


5. An Example 


The kind of analysis that multivariate information transmission yields 
can be illustrated by a set of data obtained from one subject in an experiment 
on frequency judgment. 

Four equally loud tones, 890, 925, 970, and 1005 cycles per second 
were presented to the subject one at a time in random order. Each tone was 
3 second long and separated by about 3 seconds from the next tone. During 
preliminary training the subject learned to identify the tones by pairing them 
with four response keys. In experimental sessions, a loud masking noise was 
turned on and a random sequence of 250 tones was presented against the 








o 
Q 
s 
° 
5 
® 
a 


Stimulus-Response Frequency Table 
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noise background. A flashing light told the subject when the stimulus occurred, 
and he was instructed to guess if in doubt about which one of the four tones 
it was. 

One object of the experiment was to find weights for both the frequency 
stimulus and the immediately preceding response in determining which key 
the subject would press. Tests were run at several signal-to-noise ratios. 
The data presented here were obtained when the signal-to-noise ratio was 
close to the masked threshold. 

In order to calculate weights, we can consider the experiment as an 
example of three-dimensional transmission. Our analysis is based on the 
responses to the 125 even-numbered stimuli. The odd-numbered responses are 
considered as the context in which the subject judged the even-numbered 
stimuli. The odd-numbered stimuli are ignored in this analysis. 

The stimuli will be designated as the variable u. Last previous responses 
are called “presponses” and they will be indicated by the variable v. These 
are the inputs. Current responses are represented by y. This is the output 
variable. Thus we can identify the joint event (7,j,m) as the occurrence of 
response m to stimulus 7, following presponse j. Failure to respond is con- 
sidered as a possible response. Consequently there are four stimulus cate- 
gories and five response categories. 

The subject’s responses to the 125 test stimuli were sorted into a 
4 X 5 X 5 contingency table. Two of the reduced tables that were obtained 
from this master table are reproduced here in order to illustrate our com- 


TABLE 1 TABLE 2 


Presponse-Response Frequency Table 












































Stimulus Presponse 
> 

1 2 3 4 re) 1 2 3 4 

1 ¢ 3 2 2 7 re) 1 2 3 0 1 7 

5 2 2 2 10 1 1 2 4 3 1 10 
12 10 13 12 47 a 2 2 13 8 20 4 47 

g 

8 10 12 7 37 3 3 7 12 6 9 37 

5 5 hk 10 2h 4 3 3 0 5 3 ok 
31 30 33 31 125 10 26 27 bh 18 125 
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putations. For example, the Stimulus-Response plot in Table 1 has entries 
Nim +» Lhe calculation for s;, goes as follows: 


Sim = a [1 log, 1 + 5 log, 5 + 12 log, 12 + --- +7 log,7 + 10 log, 10], 
Sim = 374.05750/125, 
Sim = 2.99246. 


In the same way, s;,, is computed from the figures for n;,, in the Presponse- 
Response table, Table 2: 


ake tt log. 1 + 1 log, 1 + 2 log, 2 + --- + 9 log. 9 + 3 log, 3], 


Sim = 195 
Sim = 372.38710/125, 
Sim = 2.97910. 


We obtain the value for s; from the n; in the bottom marginal of Table 1: 


1 
125 


620.83188/125, 
8; = 4.96665. 





8; [31 log, 31 + 30 log, 30 + 33 log, 33 + 31 log, 31], 


8; 


The computation for s is based on the total number of measurements: 
s = log, i25 = 6.96579. 


It is evident that these calculations are performed very easily with 
a table of n log, n. If he wishes, the reader may also make the computations 
with tables of p log, p like those prepared by Newman (8), and Dolansky (3). 
The use of p log, p tables for analyzing discrete data is not recommended, 
however, because it leads to rounding errors that the table of n log, n avoids. 
The complete set of s-terms in the experiment on frequency judgment worked 
out as follows: 

Siim = 1.45211 8; = 4.96665 


2.91389 8; = 4.79269 
Sim = 2.99246 8m = 4.93380 
Sim = 2.97910 s = 6.96579 


In section 4 it was shown that response information, H’(y), can be 
analyzed into!components 


a 
Il 


H'(y) = Huy) + T’usy) + T ;y) + A’wy). (17) 
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Since H’(y) = s — 8, , we see that H’(y) = 2.03199 bits. If the subject 
had used the four response keys equally often, this figure would have been 
at most 2 bits. The extra information shows that the subject sometimes did 
not respond. This can be verified from the right-hand marginals in Tables 1 
and 2. The rest of the quantities in equation (17) are easily computed from 
s-terms. For example, H/,(y) is computed from s;; — S:im . We see that 
H!,(y) is 1.46178 bits. This is the part of the response information that 
is not accounted for either by the auditory stimuli or the presponses. Con- 
sequently, 1.46178/2.03199 or 72 per cent of the response information is 
unanalyzed error. Some 28 per cent of the response information must therefore 
be due to associations between the subject’s responses and the two predicting 
variables. 
If we consider the association between auditory stimuli (w) and responses 

(y), we have 

T’(uyy) = 8 — 8: — 8a + Sin 

T’(u;y) = .05780. 


Thus only .058 bits are transmitted from the frequency stimuli, accounting for 
less than 3 per cent of the response information. This is not surprising because 
the signal-to-noise ratio was set near the masked threshold and the stimuli 


were difficult to hear. 
If we consider the association between presponses (v) and current 
responses (y), we find a little more transmitted information: 


T'vsy) = 8 — 8 — 8m + Sin, 

T’(vjy) = .21840. 
This value of .218 bits transmitted, amounts to some 11 per cent of the 
response information. 


The last element in equation (17) is the stimulus X response X presponse 
interaction, A’(uvy). This is computed from 


A'(uvy) = —8 + 8; + 8; + Bn — 8:5 — Bim — Bim H Siim 5 

A’(uvy) = .29401. 
We see that about 14 per cent of the response information is due to the 
interaction. Knowledge of the interaction also permits us to hold one of the 
inputs constant while measuring transmission from the other input. For 
example, the transmission from stimuli to responses with presponses held 


constant is: 
Ti(u;y) 


a; — 6 — Sin. + Bue 
T’(u;3y) + A’(uvy) 
39181. 
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Our calculations for the parts of the response information that we 
can analyze with the three-dimensional model, lead to weights of approxi- 
mately 3, 11 and 14 per cent for stimuli, presponses and interaction respec- 
tively. These figures sum to 28 per cent, the amount of transmitted informa- 
tion we predicted from the size of the noise term. We can also obtain this total 
weight directly by computing the information transmitted from both inputs 
together. We have 


T’(u,v3y) = 8 — 8m — 8:5 + Siim 
T’(u,v;y) = .57021. 


If we now divide this three-dimensional transmitted information by the 
response information, we get back our figure of 28 per cent. 

There are several points worth noting about our application of informa- 
tion theory to this experiment. The first is that the analysis is additive. 
The component measures of association plus the measure of error (or noise) 
sum to the response information. Furthermore, the analysis is exact. No 
approximations are involved. The process is very similar to the partition 
of a sum of squares in analysis of variance. As a matter of fact, a notation 
can be worked out in analysis of variance that is exactly parallel to the 
s-notation in multivariate information transmission (4). 

The second point is that information transmission is made to order 
for contingency tables. Measures of transmitted information are zero when 
variables are independent in the contingency-sense (as opposed to the restric- 
tion to linear independence in analysis of variance). In addition, the analysis 
is designed for frequency data in discrete categories, while methods based on 
analysis of variance are not. No assumptions about linearity are introduced 
in multivariate information transmission. Furthermore, when statistical 
tests are developed in a later section, it will be shown that these tests are 
distribution-free in the sense that they are extensions of the familiar chi- 
square test of independence. 

The measure of amount of information transmitted also has certain 
inherent advantages. Garner and Hake (2) and Miller (5) have pointed out 
that the amount of information transmitted is approximately the logarithm 
of the number of perfectly discriminated input-classes. in experiments on 
discrimination like the one we have discussed, the measure provides an 
immediate picture of the subject’s discriminative ability. Miller has also 
discussed applications of this property in mental testing and in the general 
theory of measurement. 


6. Independence in Three-Dimensional Transmission 


It is evident from the definition of transmitted information that 
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T’(u,v;y) = 0 when the output is independent of the joint input, i.e., when 
Ni; "Nm 

Niin = n ° (20) 

With this kind of independence, we can show that 
SiimI= 833 + 8m — 8. 

This expression for s;;,, may be substituted into (5) to confirm the fact that 
T’(uv;y) = 0. 


Now suppose that T'(u,vsy) > 0 but that » and y are independent, 
that is to say, 





i. ae, (21) 


This leads to 

Sim = 8; +8, — 8. 
If we substitute for s;,, in equation (9), we find that T’(v;y) = 0. Equation 
(21) does not provide a unique condition for independence between v and y. 
To show this, let us pick some value of u and study the v-to-y transmission 
at that value of u. We now require that 


Ni °N: 
Risen = = (22) 


If we have (22) for all 7, we must have 

Sitm = 8:5 + 8im — 8 , 
and it follows from substitution in (10) that 7/(v;y) = 0. This is the situation 
in which v and y are independent provided that u is held constant. It is an 


interesting case because we can show from (14) that if this kind of independ- 
ence happens, 


A’(uvy) = —T’(v;y). 


The sign of 7’(v;y) must be positive or zero so that — 7’(v;y) must be negative 
or zero. Consequently, A’(uvy) can be negative. We see that negative inter- 
action information is produced when the information transmitted between a 
pair of variables is due to a regression on a third variable. Holding the inter- 
acting variable constant causes the transmitted information to disappear. 

If we have the independence defined by (21), we may not necessarily 
have the independence defined by (22). Let us suppose that we have both, i.e., 
that we have 


Sim a 8; + Sm ae 8, 


Sijm = 8:3 + Bim — 8. 








—— 
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Now we substitute for s,,, and s;;,, in equation (8). 
Ti(ujy) = 8; — 8:3 — Sim + Sim 5 
Tiusy) = 8; — 8:55 — 8; — 8a +8 + 85; + Bin — 8, 
Ti(ujy) = 8 — 8 — 8m + Sim 
Ti(usy) = T’(usy). 

Both kinds of independence, (21) and (22), together mean that v is not 
involved in transmission between uw and y. When this happens we do not 
have three-dimensional transmission, since u is the only input variable 
(provided that no information is transmitted between wu and v). As might 


be expected, both kinds of independence can be generated from a single 
restriction on the data, namely 


a Nim 
Niim oo V’ 


where V is the number of classes in v. 

We have studied the case where v is independent of y. We could have 
had u independent of y, or u independent of v. The results are analogous to 
those we have presented. 


7. Correlated Sources of Information 


Three-dimensional transmitted information, T’(u,v;y), accounts for 
only part of the total amount of association in a three-dimensional contingency 
table. It does not exhaust all the association in the table because it neglects 
the association between the inputs. When this association is considered, i.e., 
when all the relations in the contingency table are represented, we are led to 
an equation that is very useful for generating the components of multivariate 
transmission. Consider 


C’(uv,y) = H’u) + A’) + H’(y) — H’(u,v,y). (23) 
If we add and subtract H’(u,v), we obtain 
C’(uv,y) = T's) + T’(u,v;y), 
C'(uv,y) = T’(uyn) + T’(usy) + T’osy) + A’(wy). (24) 


We see that C’(u,v,y) generates all possible components of the three corre- 
lated information-sources, u, v, and y. 


8. Four-Dimensional Transmitted Information 


It will be instructive to extend our measures one step further, i.e., to 
transmitted information with three input variables, since from that point 
results can be generalized easily to an N-dimensional input. For simplicity 
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we shall restrict our development to the case of a channel with a multivariate 
input and a univariate output. The more general case with N inputs and 
M outputs does not present any special problems, and can be constructed 
with no difficulty once the rules become clear. 

Let us add a new variable w to the bivariate input, u,v. The joint input 
is now u,v,w. We suppose that w sends signals h = 1, 2, 3, --- W. This gives 
us four sources of information u,v,w, and y. We can proceed to define a four- 
way interaction information, A’(uvwy), as follows: 


A’(wwy) = As(wy) — A’(ury). 


We have already defined A’(uvy). The definition of A/(uvy) will be similar 
except that the subscript w indicates that A’(uvy) is to be averaged over w. 
As we have already noted, this is accomplished by adding the subscript h to 
each of the s-terms that make up A’(uvy). Consequently 


AJ(uvy) = —8, + Bas + 8a; + Sim — Saiz — Sim — Sim 1 Sait - (25) 


It is readily shown that A’(uvwy) is symmetrical in the sense that it does not 
matter which variable is chosen for averaging, i.e., 


A’(wwy) = Avvwy) — A’(vwy), 
Al(uwy) — A’(uwy), (26) 
A,(wy) — A’(ury), 
Aj(uww) — A’(urw). 


We see that A’(uvwy) is the amount of information gained (or lost) in trans- 
mission by controlling a fourth variable when any three of the variables are 
already known. 

If we examine all possible associations in a four-dimensional contingency 
table, we obtain 


C'(u,v,w,y) = T’(uy) + T’(ujw) + T'(usy) + T’oyw) + T’o3y) + T’(wsy) 


+ A’(ww) + A’(wy) + A’(uwy) + A’(vwy) + A’(wwy), (27) 
where 
C’(uv,w,y) = H’(u) + Hv) + H’(w) + H’(y) — H’(u,v,w,y). 

Equation (27) can be proved by expanding both sides in s-notation. 
It turns out that in the general case, C’(u,v,w, --- , y) is expanded by writing 
down T-terms for all possible pairs of variables, and A-terms for all possible 
combinations of three, four variables and so on. 

Four-dimensional transmitted information from u,vw to y, i.e., 
T’(u,v,w;y), can be written as follows: 


T'(u,v,w;y) = H'(y) + H"(uv,w) — H"(uv,w,y). (28) 
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The same arguments are used to justify (28) as were used in the case of (4) 
in three-dimensional transmission. To find the components of T’(u,v,w;y), 
we note that 

T’(uv,wsy) = C’(u,v,w,y) — C’(u,v,w). (29) 
This means that T’(u,v,w;y) contains all the components of C’(u,v,w,y) except 
the correlations among the inputs. Consequently the components of 
T’(u,v,w;y) are 
T’(u,v,wyy) = T’(usy) + T’esy) + T’(wiy) 

+ A’(uvy) + A’(uwy) + A’(vwy) + A’(uvwy). (30) 


The components of T’(u,v,w;y) are shown in schematic form in Figure 2. 


Aw, 


T’ (uyw;y) 


Figure 2 
Schematic diagram of the components 
of four-dimensional transmitted in- 
formation, with three transmitters and 
a single receiver. 


If it happens that 


Nhiim = Niim/W, 


where W is the number of classes in w, all the components of C’(u,v,w,y) that 
are functions of w drop out and C’(u,v,w,y) = C’(u,v,y). In similar fashion, 
C’(u,v,y) can be reduced to C’(u,y). This is precisely what we did in the 
analysis of independence in three-dimensional transmitted information. Since 
C’(u,y) = T’(u;y), we see that all cases of transmission with multivariate 
inputs can be related to the bivariate case. 

With three inputs controlled, we are ready to extend the analysis of 
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response information in section 4, a step further. We have 
H"(y) = Hivw(y) + T’(u,v,w;y). (31) 


Equation (31) says that we can measure the effects in response information 
due to the three inputs. This is evident from the fact that (30) tells us how 
to expand 7’(u,v,w;y) in its components. In addition we know that 


Hi.(y) = Hiww(y) + Ti.(w3y), (32) 
where 
Ti.(w3y) = T’(w3y) + A’(wwy) + A’(vwy) + A’(uvwy). (33) 


We see that controlling w in addition to u and v, enables us to rescue the 
information transmitted between w and y from the noise, and to replace 
H’,(y) with a better estimate of noise information, namely H/,,,(y). 

The transition to an N-dimensional input is now evident. In general, 
we have 


H'(y) = Heie...Ay) + T’(u,0,w, -+* , 25y). (34) 


The (N + 1)-dimensional transmitted information, T’(u,v,w, --- , 2;y) can 
then be expanded in its components in the manner that we have described. 


9. Asymptotic Distributions 


Miller and Madow (6) have shown that sample information is related 
to the likelihood ratio. Following Miller and Madow, we can show that the 
large sample distribution of the likelihood ratio may be used to find approxi- 
mate distributions for the quantities involved in multivariate transmission. 

Consider, for example, three-dimensional sample transmitted-informa- 
tion, T’(u,v;y). We can test the hypothesis that T7'(u,v;y) is equal to zero. 
This is equivalent to the hypothesis that 


p(i,j,m) = pli,j)-p(m), (35) 


since 7'(u,v;y) is zero when input and output are independent. This hypothesis 
leads to the likelihood ratio [see reference (7)], 


i 1] (n;i)""* I] (%1m)"™ 








A= oe ee (36) 
nm II Gian” 
If we take logs, we obtain 
a ' 
1.3863 n° 8 — 8 T Stim (37) 


—2 log, » 


1.386307" (u,v;y). 
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For large samples, —2 log, \ has approximately a x’ distribution with 
(UV — 1)(Y — 1) degrees of freedom when the null hypothesis (35) is true. 
Thus 1.3863 nT7"’(u,v;y) is distributed approximately like x’ if T(u,v;y)- is 
equal to zero. 

A more important problem involves testing suspected information 
sources. Suppose in our three-dimensional example, we assume that 


p(t,j,m) = p(t) -p(j)-p(m). (38) 
This hypothesis leads to the likelihood ratio for complete independence in a 
three-dimensional contingency table, 
n" TT @)™ TT @)™ I] @,)" 
A= : - : rm = ; (39) 
n II wa 


t,7,m 





After we take logs we find that 

—2 log. X = 38 — & — 8; — Sn — 8 + Siim 
H'(u) + Hv) + H"(y) — H’(ur,y) (40) 
1.3863nC’(u,v,y). 


For large samples —2 log, \ has approximately a x’ distribution with 
(UVY — 1) — (U — 1) — (V — 1) — (Y — 1) degrees of freedom when the 
null hypothesis is true. 

We also know that 


C’(uv,y) = T’usy) + T’esy) + Tru). (41) 

The likelihood ratio can be used to show that 1.3863 nT’(u;y) and 

1.3863 nT’ (v;y) are asymptotically distributed like x’ with (U — 1)(Y — 1) 

and (V — 1)(Y — 1) degrees of freedom, respectively, if T(u;y) and T(v;y) 

are zero. To find the asymptotic distribution of 7) (u;v), we make the following 
hypothesis: 


where p,,(j) is the conditional probability of 7 given m. 
Now we have the ratio 


a n~ I] (nim)"*™ T] (nis) 





je (43) 
n } i (Nijm) sie 
0 
1.3863 n ~ 8" — Sim — Sim TF Siim s (44) 


—2 log. A = 1.3863nT{(u;v). 
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In this case —2 log, \ has Y(U — 1)(V — 1) degrees of freedom. In view 
of (41) we can write 


1.3863nC’(u,v,y) = 1.3863n[T’(u;y) + T’(v;y) + T2(uj)]. (45) 


The quantities on the right side of (45) have degrees of freedom that sum to 
(UVY — U — V — Y + 2). Since this is the same number of degrees of 
freedom as on the left hand side of (45), the quantities on the right side of 
(45) are asymptotically independent, if the null hypothesis, 


p(i,j,m) = p(t)-p(j)-p(m), 
is true. 

This means that as an approximation we can test T’(u;y), T’(v;y) and 
T/(ujv) simultaneously for significance under the null hypothesis we have 
stated. The test is very similar to an analysis of variance. We can see the 
similarity by applying the test to the data from our example in section 5. 
The significance tests will be made on the quantities in equation (45). To do 
this we need to compute C’(u,v,y) and 7/(u;v), since these terms were not 
discussed in section 5. First we note that C’(u,v,y) is the total amount of asso- 
ciation in the stimulus X response X preponse table. We have 


C’(u,v,y) = 28 + 8iim — 8: — 8; — 8m, 
C’(u,v,y) = .69055. 


We also need 7’)(u;v), the information transmitted from presponses to stimuli 
with responses held constant. This measures how successfully the presponses 
predict the auditory stimuli. Since stimuli were chosen at random, we do not 
expect much transmitted information here. The computation goes as follows: 


T}(ujv) = Sm — Sim — Sim +- Siim ) 
T’'(uyw) + A’(wy), 
= ,41435. 


Il 


We may now put our computed values for C’(u,v,y), T’(u;y), T’(v;y) and 
T'(u;v) into equation (45) and perform the x’ tests. The results are summarized 
in Table 3. We have not attempted to calculate the significance level of 
C’(u,v,y) because we do not have enough data to sustain the 88 degrees of 
freedom. The same criticism can probably be leveled at our test for T/(u;v). 
In any case Table 3 shows that the only significant effect in the experiment 
is the presponse-response association. 

One interesting fact that the analysis brings out clearly, is that we 
cannot decide whether an amount of transmitted information is big or small 
without knowing its degrees of freedom. In our example we find that T/(ujv) = 
414 bits, while T’(v;y) = .218 bits. Yet T’(v;y) is significant and T}(u;v) 
is not. The reason lies in the difference in degrees of freedom. Miller and 
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TABLE 3 


Table of Transmitted Information 



































Transmission Component -2 log,» d.f. P 
Stim lus-Response T'(usy) 10.016 12 >.50 
Presponse-Response T'(vsy) 37.844 16 <.01 
Presponse-Stimulus T 'y(usv) 71.802 2.14 
Total C'(u,v,y) 119.664 








Madow (6) have discussed the amount of statistical bias in information 
measures due to degrees of freedom, and have suggested corrections. 

In Table 3, we tested T/(u;v), the association between presponses and 
stimuli with responses held constant. This association is broken down still 


TABLE & 


Table of Transmitted Information 


























Transmission Component -2 log,» a.f i 
Presponse-Stimulus T'(u;v) 20.853 12 >.05 
Interaction A'(uvy) 50.948 Sd 
Total T'y(u;v) 7.802 60 2.14 























** Probability not estimated. 


further in Table 4. No probability is estimated in Table 4 for the interaction 
term, A’(uvy), because its asymptotic distribution is not chi-square. All 
A-terms are distributed like the difference of two variables each of which 
has the chi-square distribution. The distribution of this difference is evidently 
not chi-square because the difference can be negative. Its density function 
has been derived by Pearson, Stouffer, and David (9), but the writer has 
been unable to find a table of the integral. In some cases the problem can 
be circumvented by combining A-terms with T-terms to make new T-terms. 
[See, for example, equation (33).] However, in other cases, the interactions 
are genuinely interesting in their own right and should be tested directly. 
These cases can be treated when adequate tables become available. 
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A LEAST SQUARES SOLUTION FOR SUCCESSIVE INTERVALS 
ASSUMING UNEQUAL STANDARD DEVIATIONS*+ 


HAROLD GULLIKSEN 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


A least squares solution has been presented for the Law of Categorical 
Judgment and the Method of Successive Intervals, which is formally equi- 
valent to Horst’s solution for the matrix of incomplete data. A simplified 
approximation which is recommended for use with any matrix of complete 
data is also given. A procedure, analogous to that originally devised by 
Thurstone and presented by Saffir, is outlined for the typical experimental 
matrix of incomplete data. It is essentially a ‘point and slope’? method of 
fitting a straight line which comes rather directly from the least squares 


solution. 


Introduction 


The method of successive intervals and the method of paired comparisons 
are two basic psychophysical methods, used for the scaling of stimuli, and 
also for measuring the ability of individuals. The method of paired com- 
parisons is the more fundamental from a theoretical point of view; however, 
the experimental labor of the method is considerable. The method of suc- 
cessive intervals is generally considered to be more feasible experimentally. 

Thurstone’s absolute scaling method for psychological tests uses a 
theory that is basically the same as that of successive intervals [see Thur- 
stone, L. L., (12, 17)]. The “scaled scores’ procedure of the Cooperative 
Test Service described by Flanagan (2) also uses a basically similar approach. 


The Experimental Method of Successive Intervals 


The Method of Successive Intervals is an experimental procedure for 
gathering data. In this procedure n stimuli are sorted into k + 1 categories 
(n > k + 1) with respect to some specified attribute. The categories can be 
arranged in a rank order, each stimulus in category g being judged to have 
a psychological scale value which is “‘less than” the scale value for any stimulus 
in the category g + 1. This statement holds for all values of g from 1 to k. 
The sorting procedure is repeated N times, possibly by the same person or 


*This study was supported in part by Office of Naval Research Contract N6onr 


270-20 with Princeton University. 

{The author wishes to acknowledge heipful suggestions and comments received in 
discussions of this problem with Max Woodbury, Frederic Lord, Frederick Mosteller, 
Warren Torgerson, Robert Abelson, and Bert f. Green, Jr. Thanks are due to Mrs. Ger- 
trude Diederich and Irving Abrams for work on the computing necessary for the illustrative 


applications. 
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possibly by N different persons who are assumed to belong to a homogeneous 
group. This experimental procedure gives, for each stimulus, a frequency 
distribution over several of the k + 1 categories. It is hoped that such data 
may be accounted for by a single psychological scale. Psychological scale 
values associated with each stimulus being distributed along this scale with 
a mean and a standard deviation characteristic of each stimulus. The mean 
of the distribution of such values associated with a given stimulus is taken 
as the “scale value” for that stimulus. The standard deviation of the distri- 
bution has been called the “discriminal dispersion’ by Thurstone (14). 

This method of successive intervals was devised by L. L. Thurstone 
and first published by Saffir in 1937. It should be noted that the method 
is a basic and general one in the sense that from a mathematical point of 
view many, if not most other, psychophysical methods can be regarded as 
special cases of the method of successive intervals. The method of single 
stimuli [Wever and Zener (20), Volkmann (19)] is the special case in which 
it is assumed that the psychological separation of the intervals is correctly 
indicated by the subject. The method of equal-appearing intervals is the 
special case in which it is assumed that the subject has the ability to make 
all the intervals equal whenever he is instructed to do so. The Q-technique, 
described by Stephenson (8, 9), begins with a sorting procedure like suc- 
cessive intervals with rigid restrictions on the number of stimuli that may be 
placed in each pile. Since the subsequent calculations also assume that the 
intervals are equal, the Q-technique may be regarded as an equal-appearing 
intervals method. The method of graded dichotomies, recently described by 
Attneave (1), is a slightly modified computational procedure for successive 
intervals. With errorless data the method of successive intervals and graded 
dichotomies would give identical results. In order to proceed economically 
with multi-dimensional scaling procedures [Gulliksen (3)], some type of 
successive intervals method is highly desirable [see Torgerson (18)]. 

It should also be noted that this paper treats the case in which the 
stimulus dispersions may be unequal. The case of unequal discriminal dis- 
persions is an important one in view of the fact that the difference between 
Weber’s law, Fechner’s law, and the law of comparative judgment hinges on 
this point [Thurstone (15, 16)]. As Thurstone has shown, equally often 
noticed differences are “equal’’ only when discriminal dispersions are equal. 
It is also probable that the difference between so-called “additive” and 
“substitutive” magnitudes [Stevens (10) and Stevens and Volkmann (11)] 
will be found to be characterized by equality or inequality of discriminal 
dispersions at different points on the psychological continuum. The method 
of paired comparisons and law of comparative judgment described by Thur- 
stone [(13, 14)] may perhaps be regarded as the special case of the successive 
intervals method in which the mean values of each of the stimuli used furnish 
the dividing points between the different piles. 
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The Solution for Errorless Data 


The Law of Categorical Judgment used for scaling data collected by the 
method of successive intervals assumes the existence of a set of psychological 
scale values designated by ¢. The k dividing points between each of the 
k + 1 categories are designated by t,(g = 1 --- k) on this scale, as shown 
in Figure 1. 

First we will consider the problem of categorical judgment with errorless 







































































Figure 1 Illustrating the Method of Successive Intervals for Errorless Data 























Lo 


4 





* 
Py 








120 PSYCHOMETRIKA 


data. Any one stimulus, 7(¢ = 1, --- , m), when repeated gives rise to a dis- 
tribution of psychological scale values which have a mean (m*) and a standard 
deviation or discriminal dispersion (s*). In order to use this scaling procedure 
it is necessary that the range of psychological scale values associated with 
each stimulus be large enough to cover several adjacent categories. It is 
assumed that the distribution function over the ¢ scale is the same for each 
stimulus. In the studies so far reported it has been assumed that this function 
is the normal curve. 

Let p* designate the proportion of judgments in which stimulus 7 
is placed below category boundary ¢* . Then 


where 





or equivalently where 
m* + siz — t% = 0. 


Either of these equations may be taken as a statement of the Law of Cate- 
gorical Judgment. Thus each z* corresponds to a given p* and each p* 
corresponds to a given z* . The z* are transformed scale values which 
express {* in terms of the mean and the discriminal dispersion for stimulus 7. 

It should be noted that a given value of ¢* may be expressed in terms 
of each of the stimuli used. Thus 


t= m+ siz§ = mi + sizis . 


That is to say, z* is a linear function of z* for each possible pair of stimuli 
z and j. 

It should also be remarked that the successive intervals scaling procedure 
may utilize any distribution function. 


p= |” f@ a. 
All that is essential is a one-to-one correspondence between p* and z* . 

For errorless data the solution would be straightforward. For each 
experimental p* a corresponding z* would be read from the appropriate 
frequency distribution. Clearly the solution can be determined only within 
a linear transformation. Thus some one stimulus (say, 7 = 1) may be arbi- 
trarily selected as a standard. For this stimulus some arbitrary values would 
be selected for m% and s% ; say m* = 0, st = 1. Thus 


o = 215 (g = 1,---,h) 
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and the values of m* and s* for each of the other stimuli would then be 
determined by using the equations corresponding to any two values of g. 


The Problem for Fallible Data 


For fallible data we are given experimentally p,;, which are estimates 
of p* . Correspondingly, the z,;, as defined by 


1 Zig = , os 508 
Pio = — | ei” dz 4 
V/ 24 —o = 1, AS k 


or alternatively by any distribution function 





z 


pa = |” f@ ae, 


are estimates of z* . It should be noted that for the normal distribution 
function only k usable proportions (the dividing points between adjacent 
piles) will be yielded by the k + 1 piles. The proportion “zero” marking the 
bottom of the lowest pile, and ‘‘one” marking the top of the highest pile 
correspond to minus infinity and plus infinity respectively and hence are 
not usable in the least squares solution. The z;, need not be consistent. 
That is to say, it is necessary only that z;, and z;, be approximately linearly 
related to each other. 

Since each of the n stimuli have been sorted into k + 1 categories N 
different times, the basic experimental information is given in an n by k + 1 
frequency table which shows the number of times each of the n stimuli has 
been placed in each of the k + 1 categories. Such a set of data is shown in 
Table 1 where n = 16, N = 133 and k + 1 = 8. These data are selected 
from Saffir (7) by deleting some of the stimuli and combining the end cate- 
gories in order to obtain a matrix with no zero frequencies in the extreme 
categories. 

From the frequencies of Table 1 a table of cumulative proportions can be 
constructed showing p;, , the proportion of judgments in which the stimulus 
was judged to be below each of the division points (A to G@) between two 
adjacent categories. Since the proportion below the lower bound of the 
lowest category is always zero and that below the upper bound of the highest 
category is necessarily 1.00, these extreme proportions contribute no informa- 
tion and may be ignored in the analysis. 

From the table of cumulative proportions, Table 2 showing z,, has 
been constructed by the use of a normal table with any arbitrary mean, 
such as zero, and some arbitrary variance, such as unity. The rest of the work 
in this paper is in terms of the z,;, . The problem is to utilize the experimental 
z;, values in determining a mean scale value (m,) and a discriminal dispersion 
(s;) for each stimulus and the scale value of the division points (é,) between 
adjacent categories. 
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° TABLE 1 
Frequency Distributions of 133 Judgments for Each of 16 Nationalities in Eight Categories 


(Data selected from Seffir, Psychometrika, 2, 1937, pp- 179-198) 
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Table of 216 Values 
Cate- 
Same 1-2 3 4 5 6 7 8 9-10 i 
Mean 
Cate- k f 
~<a A B c D E F G Lz z, S.D. t 
aries el f 
1 42 2:17 1.69 1.88 2.17 2.43 2.43 12.19 1.7414 6789 
2 12 91 1.62 1.88 2.00 2.00 2.17 10.70 1.5286 6923 : 
3 -.01 5 1.07 1.55 1.62 2.17 2.43 9.58 1.3686 —-. 7780 | 
4 -.16 86 1.62 1.78 2.00 2.43 2.43 10.96 1.5657 8625 
5 -.68 -.07 46 1.25 1.62 2.17 2.43 7-18 1.0257 1.0767 
6 1.25 -.52 -01 -4O 91 1.49 1.88 2.92 +4171. «1.0252 
a? -1.00 -.59 -.2h 32 ‘73 1.14 1.49 1.85 +2643 8512 
a 8 -1.14 -.70 -.26 +24 15 1.10 1.44 1.43 -2045 -8839 
a 
9 1.44 -.9h -.32 .07 -50 94 1.25 -06 0086 9071 
10 -1.78 -1.21 - .83 -.40 -10 54 1.04 -2.54 +3629 9214 
lu -1.10 -.70 -.38 -.16 -20 48 -83 -.83 -.1186 «6246 
12 -2.00 -1.44 -1.04 -.73 -.16 3h 91 “4.12  -.5886 -9436 
13 -2.43 -1.49 -1.21 -.B -.38 -.01 46 -5.84 = -.8343 +9009 
14 -2.43 1.49 1.25 - 83 ooh -.12 4h -6.16 -.8800 8778 
15 -2.17 -1.62 -1.34 -.97 -.46 -.32 14 6.94 -.9914 -7316 
16 -2.00 -1.55 -1.17 -1.04 - 83 -.54 -.30 -7-43 -1.0614 «5395 
£ -19.05 -8.63 -1.57 446 10.09 16.24 21.47 23.01 3.2871 13.2952 








A Least Squares Solution 


For fallible data one solution to the problem is to utilize a least squares 
solution. This would be useful in many different experimental situations 
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and also useful as a solution against which various more rapid procedures 


might be evaluated. 
Define an error E by 


cite ie 
ss = ° 2. — 2 ’ 
E — b? > 2, (ms + 8,2, t,) 9 = e Pena  - (1) 


where m; , 8; , and ¢, are estimates respectively of m* , st and ¢* . Specify 
an arbitrary origin (a) and unit (b) for the ¢-scale values by 


> t, = ka, (2) 
> 3 = k(a’ + Bb’). (3) 


Equations (2) and (8) specify that the mean scale value of the division 
points (¢,) is a and that the standard deviation of the ¢, is b. 

Since a designates the origin it is possible to set a = 0 if desired. However, 
the scale unit, b, must be positive; it cannot reasonably be taken as zero or 
as a negative value; e.g., a convenient possibility is b = 1. 

A least squares approach to the successive intervals scaling problem 
may be defined by stating that the values for m; , s; and ¢, are those which 
minimize EF as defined in equation (1) subject to the conditions given by 
(2) and (8). 

Since the z,;, are determined from experimental proportions of judgments 
pi, , and “theoretical” proportions (say p) are derivable from the scale 
values indicated by m; , s; , and ¢, , it is also possible to define the least 
squares problem in terms of minimizing 


n k 
p a p is (Dio a Dio)’, 
t=1 g=1 
or in terms of minimizing some weighted discrepancy of these proportions, 


such as 


> S (arocin V pig — arc sin Vo.) 


t=1 g=1 


As indicated by Mosteller (5) the equations resulting from such a defi- 
nition of the problem do not lend themselves to analytical solution. Since it 
seems reasonable to think of error in terms of the scale values, ¢, rather than 
in terms of the proportions which may be regarded as an incidental step 
on the road to the scale values, equation (1) furnishes a reasonable expression 


for the scaling error. 
Using two Lagrange multipliers y and 2), the error function E given in 


i] tui: = 2 7a 


2St Fe" ru ss 2 ae oh 
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equation (1), and the arbitrarily assigned mean and variance of the ¢, as 
given in equations (2) and (3), we may write a new function Q as follows: 


Q=2d Lim + siz, — Ui)? — — a 3 4 - ka) 


i=1 g= g=1 


oa p> ty — k(a’ + v |. (4) 


This function has (k + 2n + 2) unknowns, m/, s{, t} , y’ , and ’ . The 
problem is to find specific values (m; , s; , t, , y and \) for these variables 
which minimize Q. 

Differentiating (4) with respect to each of the m; in turn, we have the 
n equations 


1a _1¥ 7 

2am; 0b > (mi + sizi. — t)(+1), (= 1, ---,n). (5) 
The minimizing values are found by equating (5) to zero. Simplifying the 
results gives 


k k 
km: +8; >>2,,— dt, = 0. (6) 


g=1 
Using equation (2), setting 


k 
rm Zig = ki;. ’ (7) 


g=1 
simplifying and solving for m, gives 
mM; = a— 8Z;., (¢ = 1, :-> ,n). (8) 


Equation (8) expresses m, as a function of s; so that the k + 2n + 2 unknowns 
can now be reduced to k + n + 2 unknowns. 

Differentiating (4) with respect to each of the ¢, in turn gives the k 
equations 
1 0 1 , , ‘yr 
Soe = FL (mi + sein — HN-1) -2 —-t (g =1,--:,k). (9) 
The minimizing values are found by setting (9) equal to zero. Summing, 
and rearranging terms, gives 

n — b° 
ett, dG im — Dees, = 0 ; (10) 


i=l t=1 





From equation (8), (10) may be rewritten as 





es a a iN ee 
b? t. A b? na b Di 8ileig Z;.) = 0. (11) 
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In order to solve for \ sum (11) over g, use equation (7) and the con- 
dition stated in equation (2). This procedure gives 
1 
B 


n — by 


7p? kna = 0. (12) 





ka —kvX - 


Solving for \ we have 
\= —ya. (13) 


Using equation (13) to simplify (11) and multiplying both sides by 6” 
gives 


(n — b(t, — a) = > 8.(G =i), @=1,--,B 


Equation (14) may be used to find an expression for y. If we square 
both sides of (14), sum over g, and use the condition given in equation (3) 
the result is 


n k 


(n — bYy)*kd? = ODO DY sisileie — 2.) Gie — 2). (15) 


t=1 j7=1 g=1 
Note that [>-"_, s;(z;, — 2,.)]? may be written as a double summation over 
z and j. Since a sum of products of deviations from the mean is a covariance, 
(15) may be written 


(n — b*y)*kbP =k DS DY 82;.8,8;.7:; (r;; =7;; = 1), (16) 
i=l j=1 
where r;; is the correlation between z;, and z;, , and Z;, is the standard devia- 
tion of z,, (g = 1, --- , k). If there is a solution for s; , then equation (16) 
gives y in terms of known quantities. 
If we define the row vector X by 
X = (82,. 8:82. °** Seen.) = (4%, To °° Bn); (17) 
and 
R = |I|r3; || (ri, = 73; = 0), (18) 
then (16) can be rewritten as 
(n — b*y)*b? = D> D> z.a,7,; = XRX’. (19) 
t=1 j=1 


We now differentiate (4) with respect to each of the s; in turn giving 
the n equations 


1d le : 
1 3@ ah > (mi) + siz;, — tz, (¢ = 1,---,n). (20) 


per cae 


—<.) 
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Removing parentheses, summing, and setting the partial derivative equal 
to zero we have 


k k 
mkz;. +s; 2, — >. tz. = 0. (21) 
g=1 


g=1 


Expressing m; as a function of s; from equation (8), we have 


k k 

8; > 21, — ksdi. = D> t2i, — kaz;. . (22) 
g=1 g=1 

Expressing (¢, — a) as a function of s; from equation (14) and utilizing the 

two identities 


k 


2 =2 m2 
+e, — KZi~ = kz; . 


g=1 
and 
k k 
dX Zig(Zig a z;.) — DD (Zi, 2;.)(Zi, 2; ys 

we have 

n k 

(n = b?y)ks.z;- = :® 8; z, (Zig 7 2; .)(Z;, ae Z;.). (23) 
7=1 g=a 


Writing (23) in terms of standard deviations and correlations as in 
equation (16) and dividing both sides by k2;. we have 


(n _ b*y)s.2;. = : » 852; 15; . (24) 


j=1 


Using the notation given by (17) and (18), (24) becomes 


(n — b’y)x; = 2 Li sj (¢ = 1, ---,n), (25) 
or 
XR = (n — b*y)X. (26) 
Let us define the scalar @ by , 
G=n— by. (27) 
Since @ is a scalar, 0X = X@6 thus (26) may be written 
X[R — 61] = 0. (28) 


Using definition (27), equation (19) is 
6*b° = XRX’. (29) 
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Postmultiplying (28) by X’ and substituting from (29) we have 
oh” = XX’. (30) 


Equation (28) can be solved routinely by finding ‘the various values of 6 
which are the roots of 

|R-oel|=0. (31) 
Using these values of 6 in (28), the elements of X are proportional to the 
cofactors of the determinant given in (31). The proportionality constant is 
chosen so that equation (80) is satisfied. The s;’s are then given by equation 
(17), the t, by equation (14), and m; by equation (8). Thus the unknowns 


t, , m; and s; can all be found. 
In order to determine which of the latent roots (@) should be used to 


secure the minimum value of FE it is necessary to evaluate E in terms of 6 
or y. Substituting (8), (14), and (27) in (1) gives 


“p> , i E (4 — %.) — = 5 oe Silee — z;. 2). (32) 


#=1 g=1 


This expression is easier to manipulate if we define 


Yio = 8:(2:, — 3;.), (33) 
and 
> Yio = NY -g e (34) 
Thus we have 
1 n k 2 
B= ED [m -50.]- 4 


Squaring and summing 


Dy) k k ” 
-4(53 D Vie — i é ae. > nat |; 


it=1 g=1 


from the definition of 7., and y;, 


B= | Date t (2) Snot] 30) 


One of the n’s may be put back under the summation sign giving 


ri -}[> sia k +(3- 2) > (ng) I (37) 


t=1 g=1 


Using the definition of 7,, and triple summation notation we have 


e-4[ Sates (8-)SS Sw] oo 


i=1 


t=1 j=1 g=1 
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From the definition of y;, and defining kr,,2;.2;, a8 )s*-1 (2i. — 2:.) (Zio — 3), 
we have 


E = 7: BE s727.k + (s - 2) >> Dd 8,8,8;.2; rak |. (39) 
i= =1 


i=1 1 i 


Thus using the notation of equations (17) and (18) we have 


k | XX’ 4 (2 _ 2) xRx’ | (40) 


B=; 86 


Substituting equations (29) and (30) in (40) and simplifying 
E = (n — Ok. (41) 


We note that 6 designates the latent roots of (28) and E is to be minimized, 
thus as is usually the case in minimizing a quadratic form the largest value 
of 6 is the appropriate one to use. Using equation (27) EH may also be ex- 
pressed as 


E = kb’y. (42) 


Equations (41) or (42) give the error which is associated with the solu- 
tion indicated by equations (28) and (31). The data of Table 2 have been 


TARTR 


TABLE 3 


Values of Tyr X, and @ Computed from Table 2 











Bol Th Freq. Stimulus Xx 
1.00 +99 52 Irish : | -9787 
99 98 28 French 2 -9402 
98 97 5 f Swedish 3 +9945 
97 -96 4 Norwegian 4 -9690 
-96 95 8 
-95 +94 5 Austrian 5 +9940 
94 93 5 Italian 6 -9979 
+93 -92 4 Russian 7 +9912 
-92 -91 1 S. American 8 -9934 
“91 +90 3 
90 89 2 Polish 9 -9974 
89 88 1 Jewish 10 +9951 
Greek 11 +9935 
‘ Mexican 12 +9981 


Japanese a5 +9966 
Chinese 14 +9939 





Hindu 15 +9953 
Negro 16 -9974 
° 15.6376 
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used to illustrate this solution for successive intervals. The correlation matrix 
|| r;; || was computed from Table 2. The largest latent root (6) of this matrix 
and the corresponding vector X together with a frequency distribution of 
the correlations (r;;) are given in Table 3. Choosing b = 1 in equation (30) 
and using equations (17), (14), (8), and (41) values of the ¢, , m; , s; and EB 











were calculated. These are given in Tables 4, 5, and 6 under the column 
headed “Least Squares.” 


TABLE 4 


Scale Values (m, ) 



































Twenty-five Nationalities in Sixteen Nationalities in Eight Categories 
Ten Categories 
INCOMPLETE DATA 
Complete Data 
Incomplete 

No. Gulliksen Saffir No. Deka Simplified sort 
American 3 - .2629 
English 2 0.000 0.000 
Canadian 3 .2726 .2728 
Scottish 4 —_ pe 
German 5 -9670 1.02 
Trish 6 1.0858 1.1637 1 -2.02277 -2.5650 -2.5102 
French 7 1.6091 1.7486 2 -1.69364 -2.2082 -2.0761 
Swedish 8 1.8412 1.9293 3 -1.63853 -1.7590 -1.7495 
Norwegian 9 1.9085 2.0439 4 -1.54744 -1.8153 -1.7589 
Holland 10 1.9664 2.1279 
Belgian 2 2.0389 2.2500 ; 
Austrian 12 2.4007 2.6971 5 - .92028 -0.9526 - 9469 
Spanish 13 2.8640 3.2459 
Italian 14 3.1419 3.4891 6 -.37261 -.4069 -.4060 
Russian 15 3.2588 3.6581 7 - 27484 -.3105 -0.3078 
S. American 16 3.2683 3.7851 8 -.20071 --2311 -0.2296 
Pole 17 3.6270 4.1406 9 00000 - .0094 -0.0095 
Jew 18 3.8357 4.3902 10 -21274 .1899 -1889 
Greek 19 4.0232 4.6351 11 -42627 - 3938 -3913 
Mexican 20 4.3154 5.0213 12 -66997 .6238 -6164 
Japanese 21 4.7116 5.5448 13 1.00411 -9261 -9229 
Chinese 22 4.7814 5.6501 14 1.08359 1.0025 -9964 
Hindu 23 5.0005 5.9533 a5 1.41355 1.3551 1.3486 
Turkish 2h 5.4045 6.4682 R 
Negro 25 5.9666 7.2720 16 2.17197 1.9676 1.9624 





The corresponding values originally published by Saffir are also given 
in the column headed “Saffir.” It should be noted that these values cannot 
legitimately be directly compared with those obtained by the present pro- 
cedure since they were obtained from the entire matrix for 25 stimuli and 
10 categories. Despite this fact the agreement (to a linear transformation) 
is good. 

I wish to thank Professor Allen Edwards for calling my attention to 
the fact that the problem of successive intervals is formally identical in a 
number of respects with the “Problem of the Matrix of Incomplete Data” 
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solved by Paul Horst [see Horst, (4), pp. 419-430]. The parallelism between 
the two problems may be indicated as follows: 


Successive Intervals—Categories are assigned by N Judges ton Stimuli; 
Incomplete Data —Ratings are assigned to N Persons by n Raters. 


The equivalent terms are Categories-Ratings, Judges-Persons, and Stimuli- 
Raters. The problem is to determine an unknown mean and standard devia- 


TABLE 5 


Discriminal Dispersions (s,) 














at tf pa a Sixteen Naticnalities in Eight Categories 
INCOMPLETE DATA 
Complete Data 
No. Gulliksen Saffir to. | piste Least 
Data Simplified 
Squares 

American 1 -9643 
English 2 1.0000 1.000 
Canadian 3 1.0370 1.034 
Scottish 4 1.1287 1.433 
German 5 1.3111 1.462 
Irish 6 1.3652 1.502 1 -96313 1.473 1.442 
French 7 1.2278 1.201 2 -81950 1.445 1.358 
Swedish 8 1.1444 1.464 3 1.10848 1.285 2278 
Norwegian 9 -9679 1.064 4 -81954 1.159 1125 
Holland 10 -9830 1.122 
Belgian 11 .8599 .992 
Austrian 12 1.1409 1.343 5 - 88696 -929 -923 
Spanish 13 1.0271 1.191 
Italian 14 1.1493 1.602 6 1.00000 -975 -973 
Russiar 15 1.2336 1.896 ‘4 1.20364 1.175 1.164 
S. American 16 1.2713 1.816 8 1.14195 14351 1.124 
Pole 17 1.4217 1.823 9 1.14742 1.102 1.100 
Jew 18 1.8769 2.473 10 1.64717 1.601 1.593 
Greek 19 1.2066 1.609 11 1.10058 1.085 1.078 
Mexicar 20 1.2144 1.575 12 1.01238 1.060 1.047 
Japanese 21 1.3111 1.917 13 1.22872 1.110 1.106 
Chinese 22 1.3300 1.924 14 1.23061 1.139 1.132 
Hindu 23 1.4486 1.965 15 1.40228 1.367 1.360 
Turkish 2h 1.2510 1.580 
Negro 25 2.3048 3.219 16 2.05601 1.854 1.849 


























tion for each Rater (or Stimulus), and an unknown scale-value for each 
Rating (or Category Boundary). 
The correlation between the theoretical category boundaries (t,) and 
the experimental category boundaries (given by m,; + s;z,;,) is Horst’s y; 
it is given in terms of the present development by 
a | 
.*“. : kn’ 
Minimizing the error term E as given by equation (1) is equivalent to maxi- 
mizing y, the correlation given by Horst’s equation (12). 
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The elements of the vector Horst designates X are analogous to ¢, of 
the present development; and the elements of the vector B are analogous 
to m; . The vector X as given by equation (28) here is equivalent to an equa- 
tion for Horst’s vector A that may be derived by assuming a matrix of 


TABLE 6 


Category Boundaries (t,) 














Twenty-five Nationalities Sixteen Nationalities 
in Ten Categories % in Eight Categories 
INCOMPLETE DATA 
Complete Data 
Incomplete 
Gulliksen No. Deke Simplifiea Least 
Squares 
0.5821 1 -1.59113 -1.7308 -1.7087 
1.6710 2 -0.93887 -0.9142 - .9036 
2.5376 3 -0.42766 | -0.35735 --3557 
3.1506 4 0.03802 -0952 -0923 
3.6373 5 0.54048 5236 5168 
4.1205 6 1.02699 -9910 -9802 
4.6464 7 1.57412 1.3926 1.3787 
5.1537 8 
5.8911 9 
‘ =a 3.4878 0.3171 - .00004 0.0000 
= .02988 .00K64 .0228)5 .02265 
a ee s2 is 
E(t - t)° =x2t° -7¢ 23.076575 7-415154 | 7.16380 | 7.00000 
- oe 
8, =b 1.059308 1.02340 1.000 
2 
E(s2, +m, - t,) 13.105077 344025 2.5360 
E (for complete data) 2.558573 | 2.5360 
f -0308 -0046633 























complete data and substituting equation (29) into equation (30) in Horst’s 
development. 

One difference between the two developments is indicated by Horst’s 
statement ‘‘--- we assume without loss of generality that the ratings for 
each rater are given in terms of standard scores ---” Since the “scaling 
constants’ for the various raters are not to be compared, no special significance 
attaches to the vector A. Therefore, this assumption may be retained for 
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the incomplete data problem. However, the corresponding vector X in the 
successive intervals problem is of interest only in leading to a value for the 
discriminal dispersion of each stimulus. This requires equation (17) which 
utilizes the original standard deviation for each stimulus rather than an 
arbitrary one. 


Assuming Equal r;; or Using Average r;;(¢ ¥ 7) 

Since the correlations r;; are mostly in the high nineties and fairly 
close together it is reasonable to investigate the special case in which it is 
assumed that all the r;; are sufficiently alike so that it is plausible to suppose 
that some average experimental value (f) would represent the population 
value (r*). The corresponding correlation matrix with unity in the diagonal 
cells and # in each off-diagonal cell is designated R. 

The largest root of the determinantal equation 


|R— | =0 (43) 
is 
6=1+(n— I). (44) 
Substituting this value in (43) gives 
\R-[1+@-— DA | =0. (45) 


The cofactors of the first row of this determinant are proportional to the 
x; , thus 


cx; = (—F)"""(n)"”. (46) 


The z; are all equal subject to the restriction of equation (30) which in this 
special case becomes 


XX’ = [1+ (™— 1)A)b’. (47) 


As Ff approaches unity XX’ approaches nb’. As F approaches zero XX’ ap- 
proaches b’. Designating x; by c’ we have 


becca? x 4/2 (48) 
n n 
From equation (17) 
.. (49) 
From equation (8) 


m=a-—c’. (50) 
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From (14), (17), and (44) 
b ~ (ts — 2.) c’ ” (# ag 2) 
t, = r= ” - a a 51 
7 ee re ee z;. ated z;. (51) 


Instead of going to the labor of computing an average intercorrelation 
in order to obtain a value for * the same ¢, values can be obtained if it is 
possible to define 6 in some simpler manner. To obtain @ by an easier route 
we may define a provisional value u, by 


io ¥ a =), (52) 
i=1 eis 
Note that the standard deviation (i) of the u, is related to @ and to #, the 


average 7,;(t ¥ 7), by 
i= Vn+nn — IF = V8, (53) 











so that 
= 


= 1% 


Selecting arbitrary values a and b for the mean and standard deviation 
of ¢, equation (51) becomes 


Ug J 
tA =a+b a’ (54) 
Correspondingly, c’ of equation (48) becomes 
beg Se 
ile (55) 


Thus if we are willing to assume that all the r;; represent a single popu- 
lation value, the solutions from the matrix of complete data for m, , s; , and 
t, are given by equations (49), (50) and (54) with the aid of the definitions 
in equations (52) and (55). 

From equations (41) and (44) the sum of the squared discrepancies for 
this procedure is given by 


E = kn] 1 “ () | a ie ~ DG —*, (56) 


where as in equation (53) ? is the average r;;(2 ¥ j). 


Bert F. Green, Jr., has pointed out that even when the r,; are not equal, 
if the solution of equations (48) to (51) is used, then the error is still correctly 
given by equation (56). 

The simplest solution of this type is given by setting 


c=) 
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in equation (48) which is equivalent to 


a en, oon 
7 1+(m-— 1? 66 it (57) 
and 
o= (58) 
in equation (50); whereupon equation (49) becomes 
1 
3 z.. (59) 
and equation (50) becomes 
Z;. 
mn; = eg (60) 


i, = sie Ug (61) 


where u, is defined by equation (52). 
The error E£ is still correctly given by equation (56) since H was defined 
so that it would not vary with the arbitrary values selected for a and b. 
Scale values calculated by formulas (59) to (61) with c’ = 1 anda = 0 
are shown in Tables 4, 5, and 6 in the columns headed “Simplified.”” Equa- 
tions (59) to (61) are probably the best ones to use if the matrix of complete 
data is available. 


Specifying the Scale in Terms of the Stimuli 


The typical experiments will give a matrix of incomplete data. In this 
case it seems best to define the scale by arbitrarily assigning a scale value 
(m,) and a discriminal dispersion (s,) to one of the stimuli rather than by 
some arbitrary assignment of values for a and b. Let us first see how this 
procedure may be used for the matrix of complete data and then apply it 
to the matrix of incomplete data., 

In the solution indicated in Saffir (7) the unit and origin for the scale 
values (¢,) are not determined directly, but instead are fixed indirectly by 
specifying some stimulus (say c) as the criterion stimulus and assigning the 
values zero and one respectively to the mean (m,) and the standard deviation 
(s.) for this stimulus. 

For the case in which all z;, are available, and in which all the correla- 
tions r;; may be regarded as equal, this procedure is equivalent to setting 


a= Z,., (62) 
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and 
7 a ee 
b = Z.. eaves. af a (63) 
We have then 
a 
S; = 2. z,. (64) 


which simplifies to unity when 7 is c and hence satisfies the condition on the 
standard deviation of stimulus c. Also 


NI 


mn; = Zi ie Lee - . (65) 


ae 


nN 


Again for 7 = c this equation gives a mean of zero and satisfies the arbitrary 
condition on the mean of stimulus c. 
We also have 


el ee («= 2) =#,. + 2 > (tie — Be) (66) 


i=1 uo i=1 eis 





where @ is given by (52) and (53). Or expressing the value of ¢}’ in terms of 
m, and s; we have 
i=1 


e =Z,+ ; Zz. (s:Zi, +> % — Z,.). (67) 


It may be verified that the two definitions of tj’ are equivalent and satisfy 
the conditions given by equations (2), (3), (62) and (63), namely, 


> #’ = kz,. and >. @ = az + 7]. 
Equation (44) may be written 
9 = ns). (68) 
The error associated with this procedure is given by 
EB = wn] 1 7 (&) | = kn — 11 —-) = kn] 1 7 f| (69) 
Since 


te. 
* 


the error given by equation (56) is the same as that given by equation (69). 
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A Solution for the Matrix of Incomplete Data 


In the typical scaling problem the range of the scale is usually very 
large in comparison with the discriminal dispersion of any one of the stimuli. 
There is little or no overlap between the extreme stimuli so that many ex- 
perimental values for the z;, will be missing. This means that many values 
of r;; are not given by the data and also that any values for the mean and 
standard deviation of the z;, would be useless since they would be systematic- 
ally biassed by the fact that the large positive values or the large negative 
values or both would be missing. A very ingenious exact solution for the 
matrix of incomplete data has been presented by Horst (4). This solution is 
laborious, so that simpler approximations are desirable. Values for the s; 
or m; which will not be systematically biassed by missing values from one 
or both ends of the scale may be found by the procedure developed by Thur- 
stone [see Saffir (7)]. Thurstone’s procedure may be described as a two-point 
method of fitting a straight line. 

A variant of this procedure, which follows more directly from the fore- 
going least squares procedure, can be characterized as a point and slope 
method of fitting a straight line, and may be described in the following steps: 


(1) Order the stimuli so that there will be as much overlap as possible 
between adjacent stimuli. As a first approximation ordering in terms of the 
rank order of the medians may be used and changed only if a shift will clearly 
produce more overlap between adjacent stimuli. This procedure will give a 
maximum number of pairs of values z;,, and 2,,;,, for all values of 7 from 1 
ton — 1. 

(2) Calculate the mean and standard deviation of the z;,, and 2,4,,, 
using only the matching values of g. Thus one would have n — 1 pairs of 
means and n — | pairs of standard deviations which may be designated by 


Psp ekba Sly Rene (@=1,---,n-— 1). 


The dot indicates that only matching values of z;,, and z;,,,, are used. 
(3) Calculate n — 1 ratios of standard deviations setting 


8; z; : 
cn G=<1,>--,# — 2). 
Si+1 2; 


(4) Assign an arbitrary value such as unity to a specified s, and then 
use the ratios given in step (3) to determine the other n — 1 s; . 
(5) Using the s; values obtained in step (4) and the paired means from 
step (2), calculate the n — 1 differences. 
Misi — Mm; = 8; ma er = l,>>*, 8 = 9. 


(6) Assigning an arbitrary value such as zero to a specified m, the 
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differences of step (5) may be used to determine the remaining n — 1 of the 
m,;. 


The problem is to assign values to s; , m; , and ¢, which will minimize EF. 
Since m, and s; have been determined in terms of the unit and origin implied 
by the arbitrary assignment of s, in step (4) and m, in step (6), we should 
now assign ¢, values which agree in the sense of minimizing ZH. This may be 
approximated in the following manner. 


(7) Define a provisional value 


=— 5 ee 


Ng i=l 


where the dot is used to indicate that there may be missing values of the 
2;,, , hence n, < n. Also calculate 


k 
dL & - 


g=1 


ol 


(8) Estimate n/@ by 


=ftl, 


2/38 


where 
LL Babin + tn — 64)" 
f = ; gat se ; i . 
Ln, —k Dd  — 0)’ 


(9) Calculate the i, by 
=(f + 1G, —8) +6 =(f + Di, — fi, 





and i by 


2X, 4, = 


g=1 


ce 


(10) The error EF as given by equation (1) is determined from kn values. 
In order to make some reasonable comparison for the matrix of incomplete 
data, E/kn may be compared with E/kn where: 


k 


k 2 ay + ts 


¥ g=1 i=1 





E 
te Sa =k DY (- 5 
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An approximate check to see that the results are not being distorted seriously 
by the missing values may be found by seeing if 
a 


kn Fl 
Scale values (m,), discriminal dispersions (s;) and category boundaries 
(t,) calculated on Saffir’s data are given in the first column of Tables 4, 5, 
and 6 respectively. These are in reasonably close agreement with Saffir’s 
values for m; and s;, which are given in the second column of Tables 4 and 5. 
Since Saffir presented no values for ¢, , the category boundaries, this compari- 
son cannot be made for Table 6. 


Conclusion 


A least squares solution has been presented for the law of categorical 
judgment and the method of successive intervals. This solution is formally 
equivalent to Horst’s solution for the matrix of incomplete data. A simplified 
approximation for the matrix of complete data has been given as well as a 
procedure which is not systematically biassed when used on the typical 
experimental matrix of incomplete data. 

This method is based on solving simultaneously for the scale values 
(m;), the discriminal dispersions (s;), and the category boundaries (é,), 
which will minimize the quantity 


i > > (8:2: hm =~ ", 
where the z,;, are transforms of the experimentally determined proportions 
and b is an arbitrarily assigned standard deviation of the ¢, . 

Since the complete least squares solution is unnecessarily laborious, 
a simplified approximation is also presented, see equations (59), (60), and 
(61), which is recommended for use with the matrix of complete data. In 
order to adapt this development to the typical experiment in which one has 
a matrix of incomplete data, a procedure is outlined which is similar to the 
least squares solution but is not systematically biassed by missing values in 
the experimental matrix. This procedure is analogous to the one originally 
devised by Thurstone and presented by Saffir. These methods have been © 
applied to the data presented by’ Saffir. 
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A RATIONALE FOR MINIMIZING DISTORTION IN 
PERSONALITY QUESTIONNAIRE KEYS 


Husert E. BroapEen 


THE PERSONNEL RESEARCH BRANCH, AGo* 


A rationale and a procedure for constructing questionnaire keys so 
as to minimize the effect of distortion or faking by the respondents is de- 
veloped. This rationale is based on the supposition that suppressor items 
can be identified to reduce distortion in, and thus add to, the validity of 
questionnaire keys. The procedure is designed primarily for application 
to the construction of forced choice items. 

The results of an empirical test of the efficicacy of the forced choice 
pairing procedure developed as a consequence of this rationale are presented. 
A key based on forced choice pairs developed by this procedure gave a 
validity of .33; a second key based on forced choice pairs developed by more 
conventional procedures gave a validity of .23. 


Introduction 


Personality questionnaires have been generally criticized as measuring 
instruments because of the belief that the respondents can and do, either 
consciously or unconsciously, distort their responses. Faking or distortion 
is considered especially serious when questionnaires are used for employ- 
ment or similar purposes and is usually believed to impair seriously the 
validity of the resulting questionnaire scores. This paper is concerned with 
development of a rationale and a method for improving the validity of 
personality questionnaire keys by minimizing distortion. 

At least two prior attempts at the control of distortion have shown some 
success. The K scale (1) of the Minnesota Multiphasic Personality Inventory 
was developed as an attempt at avoiding distortion in questionnaire responses. 
Ahigh score on the K scale indicated a tendency to cover up and avoid 
responses indicative of abnormal tendencies. Evidence was presented showing 
that the K scale has the statistical qualities of a suppressor and adds to the 
validity of the several scales of the MMPI. 

The most ambitious and perhaps the most successful attempt to avoid 
distortion is found in the forced choice method. The nature of forced choice 
items and the general procedure involved in their construction has been 
described by Sisson (2). Stated briefly, a forced choice item is a pair of 
alternatives matched with respect to “preference’’ or social desirability but 


*The opinions expressed are those of the author and are not to be construed as 
reflecting official Department of the Army policy. The rationale presented herein was 
developed and used in an Army research program concerned with the selection of ROTC 
cadets. In substance, the content of this paper is contained in PRS Report 868, A rationale 
for minimizing distortion in personality questionnaire keys. 


141 








142 PSYCHOMETRIKA 


as divergent as possible in validity. There are a number of more recent 
modifications of the general procedure, one of which will be discussed later in 
this report. 

Usable validity has been well established for a number of questionnaires 
based on forced choice principles and used, primarily, in the prediction 
of success of military leaders, both commissioned and non-commissioned 
(3, 4, 5, 6, 7, 9). Coefficients from .3 to .4 appear to be the rule. Consistently 
lower validities have been obtained with questionnaire items in yes-no form 
when analysis and keying followed conventional procedures and made no 
attempt to reduce distortion (8, 9, 10). 

Validities in the .30’s may seem unpromising in view of standards set by 
most textbooks in the applied or testing field. A critical review of the litera- 
ture, with consideration of cross-validation principles and the need for 
sizable samples to obtain a stable estimate of validity, suggests that for 
personality measures validities of such magnitude are, actually, quite promis- 
ing. In predicting success of leaders, in particular, high validity by any 
standard is difficult to attain. 

In any event, forced choice procedures appear to offer the most promising 
lead to the control of distortion or faking in questionnaire responses. The 
methodological developments to be described in the present report are believed 
to provide the basis for further improvement. 


The Rationale 


The rationale of the procedure to be described grew out of the simple 
idea that the forced choice method works because, in essence, variance in the 
questionnaire score due to distortion is reduced by suppressor action when the 
yes-no content is converted to forced choice form. A forced choice pair may 
be regarded as a difference score and, consequently, the invalid (or less 
valid) alternative of the pair may be regarded as having unit negative weight. 
The negative weight is desirable only if the alternative has the statistical 
characteristics of a suppressor. It is an obvious hypothesis, then, that the 
success of the technique is due to the fact that the invalid alternative reduces 
the distortion variance by suppressor action. 

Although the success of forced choice procedures is believed due to 
suppression of distortion, it seems evident that the technique in current use 
for the construction of forced choice questionnaires is not explicitly oriented 
to accomplish such suppressor action. In so far as distortion of responses 
attenuates validity, it must contribute invalid variance to the questionnaire 
score. Current forced choice procedures seek to eliminate distortion by 
requiring choice between two items having equal average ratings on social 
desirability. In computing such average ratings, individual differences in 
tendency to distort are ignored. Hence, such averages bear no explicit re- 
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lationship to the reduction of an invalid source of variance. It seems evident 
that correlational techniques, directly oriented toward reduction of dis- 
tortion as a source of variance, will prove more effective. This, briefly, is 
the line of reasoning basic to the rationale to be presented in this paper. 

As an aid in understanding the procedure, an early version, discarded 
because of practical and theoretical difficulties, will be described briefly. 
Suppose that item validity coefficients have been computed and that a 
scoring key has been determined in the usual manner by keying these items 
with high validity. The questionnaire is then administered twice to the same 
sample, once under conditions typical of those prevailing when the instrument 
is used for actual selection and again with directions and conditions of ad- 
ministration such as to elicit frank, undistorted responses. Two scores are 
then available for each case. The difference between these two scores would 
give, for each individual, the extent of distortion occurring. 

Given scores showing the extent to which each individual distorted, 
correlations between the item responses and the distortion score could then 
be computed. These correlations could be used in forced choice pairing. A 
resulting forced choice pair would consist of two items with as large a differ- 
ence in validity as possible but with equal correlations with the distortion 
score. The responses to the pair are analogous to difference scores. Such 
responses should, consequently, have near zero correlation with the distortion 
score. 

This early version was discarded because of the difficulty in devising 
the conditions for the second administration of the questionnaire so that 
frank responses could be obtained. It seemed that subjects would be suspicious 
of a second administration and would remember and repeat their previous 
responses to avoid the possibility that a comparison between the two ad- 
ministrations would be made and used to identify those who exaggerated in 
the first administration. In the author’s opinion, a theoretical defect of this 
procedure is that it is limited to instances where distortion is of conscious 
origin. 

A second procedure was adopted to avoid the difficulties described above. 
As a starting point in this second procedure, we will represent scores on a 
questionnaire key (x) as the sum of a component common to the key and the 
criterion (y), a distortion component (d), components specific to the score 
other than distortion (s), and error (e). The scoring key is presumed to have 
been constructed in the conventional manner by computing item validity 
coefficients and keying those items with high validity. Any score on the 
questionnaire key could, then; be represented by the equation 


ZB=y+d+8 +6. (1) 


If independent estimates of the components other than d were available, 
an estimate of d could be obtained by subtraction. From the practical view- 
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point of maximizing validity, it would be desirable to include s with d and to 
attempt to minimize both of these invalid components of the test score. We 
are then faced only with the problem of estimating the y component. 

In order to secure an estimate of the score component y, the role of the 
test and the criterion will be reversed and the equation showing the regression 
of the test on the criterion will be used to predict the test score. The equation 
used is 

¥=2=b.0:+A, (2) 


where c is the external criterion and A the regression constant. 

If we disregard the regression constant and take the remaining right-hand 
entry as an estimate of y, we can substitute in (1) and after rearrangement 
derive the following estimate of the distortion score: 


d;+s, +e; = 27; — b.¢;. (3) 


The development of forced choice pairs not susceptible to distortion may 
be described as follows. Item validities are computed against an external 
criterion and the most valid items selected for keying. The test papers are 
then scored using the key thus developed. By use of the formula given in (3), 
distortion scores are determined for each individual of the population. 
Biserial correlations between each item in the test and this distortion score 
are computed to yield what will be labelled distortion indexes. The percentage 
answering “‘yes” (the p value) to each of the items is obtained as a by-product 
in computing both the item validities and the distortion indexes. The validity, 
the distortion index, and the p value are thus known for each item of the 
experimental questionnaire. 

The data basic to the foregoing computations can be obtained with a 
single sample, although two samples are preferable, one to obtain the neces- 
sary data for computing the item validities, and the second to obtain the data 
necessary for computing the distortion indexes. Without the second sample, 
bias in the distortion indexes would be expected. 

At this point, it should be emphasized that in administering the experi- 
mental questionnaire to obtain data for computing the distortion indexes, it 
is essential that the conditions be representative of those obtained in operating 
use of the instrument. It is essential, in other words, that the questionnaire 
be administered to genuine applicants and that the applicants believe that 
their responses will have bearing upon the outcome of their application. 
Otherwise, distortion of the responses occurring in the experimental adminis- 
tration will not be obtained or it will not be typical of those obtained under 
operating conditions. 

From this point on the procedure follows that conventionally employed 
in constructing forced choice pairs except that distortion indexes are used in 
place of preference values. That is to say, the alternatives are paired so as 
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to have equal distortion indexes and equal p values, and at the same time, as 
disparate item validities as possible. The resulting pairs are then readmin- 
istered and item analysis and cross validation accomplished in the usual 
manner. 

Pairing on p values is not regarded as essential in developing forced 
choice pairs. It is likely, however, that the p value is itself an indirect prefer- 
ence measure and that it will serve to buttress the effect of pairing on dis- 
tortion indexes. Pairing on p value should also eliminate pairs with p values 
close enough to 1.00 or zego to reduce their effectiveness in discrimination. 

To this point, we have discussed the rationale and described a procedure 
for developing pairs. Such pairs may be shown to be free of distortion by 
demonstrating that their correlation with the distortion score approximates 
zero. We must assume, however, that the correlational properties of the 
alternatives forming the pair are unaltered when these alternatives are pre- 
sented as pairs. 

Suppose that two yes-no items, a and b, have been paired according to 
the procedure described. The numerator of the formula for computing the 
correlation of the distortion score with the difference between items a and b 
is the difference between 7,,0,0, and 7,,0,0, . The standard deviation of the 
distortion score, o, , is a constant in this difference. As a consequence of the 
pairing procedure o, equals o, and r,, equals r,, . Since all three elements of 
T.a7.74 are equal to all three elements of 7,,0,0,, the numerator of the correla- 
tion of differences is zero, and the correlation of the difference with the 
distortion score is zero. a, equals o, since the p values of a and b were equated. 
The o of a dichotomous item is V 04 , q being defined as 1 — p. 

In passing, it should be remarked that the rationale and procedures 
could be modified and used without the forced choice format. For example, a 
set of valid items might be selected together with a second set of suppressor 
items, the latter having near zero validity but high distortion indexes. 
Negative weighting of the suppressor key will lead to an over-all score in 
which the variance due to distortion is minimized. It seems likely, also, that 
the distortion indexes will allow identification of items not subject to dis- 
tortion. A key consisting of such neutral items would seem to require neither 
forced choice format nor separate suppressor keys. 


An Empirical Test 


An empirical test of the forced choice rationale and procedure will now be 
described. This test was undertaken in connection with the development of a 
self-description blank for predicting leadership performance of ROTC cadets 
(9). The content included personality items of the sort appearing in the 
Bernreuter Inventory or Guilford’s GAMIN, interest or preference items, 
self-estimates of ability, and a few. concerned with attitudes or beliefs. 
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One random half of this pool of items was administered in straight yes-no 
questionnaire form to 780 ROTC cadets at the Virginia Polytechnic Institute. 
The subjects were asked to give frank responses and were assured that the 
results were for research purposes only. Item biserial validities were then 
computed against associates’ ratings on leadership potential. 

These items were also administered to 940 cadets at Texas A and M 
College, along with a number of other tests which were being used to de- 
termine which of the ROTC cadets tested would be exempted from selective 
service. At the time the tests were administered ae Texas A and M College, 
it appeared probable that with the quota then assigned, approximately 
50 per cent of the cadets would not be exempted from selective service. Since 
cadets were, in any event, required to continue taking military training, it 
would be considerably to their advantage to gain exemption from selective 
service. For this reason, it is believed that responses of the cadets were 
typical of an applicant-like situation. Using the procedure described earlier, 
distortion indexes were then computed. 

A modification of conventional forced choice procedures was employed 
in analyzing the second half of the pool of items. The sample used to compute 
item validities consisted of 470 cadets at the Citadel. The basic data for 
computing preference indexes were obtained from 300 cadets at Clemson 
College. These cadets were asked to assume themselves to be applicants for 
admission to ROTC and to indicate (on a five-point scale) how important it 
would be in furthering their chances to make an affirmative response to each 
item. The values thus obtained were then averaged and used as preference 
indexes for the items. 

To permit comparison of the two procedures, 120 distortion pairs and 
120 mean rated preference pairs were constructed. Each set of 120 were 
paired in the same way. The alternatives were equated on distortion or on 
mean preference indexes and on the percentage answering yes to each item. 
As disparate validities as possible were maintained for the two alternatives 
of each pair. All 240 pairs were keyed according to the item validities used in 
pairing. 

Forms containing the two sets of experimental pairs were then adminis- 
tered to a sample of 400 cadets from military schools at summer camp in 
1949. The criterion employed, described more fully in (9), was a composite 
of leadership ratings. 

The 120 pairs based on rated preference indexes gave a validity coefficient 
of .23, while those paired on distortion indexes gave a validity coefficient of 
.33. The latter coefficient is significantly higher at the 5 percent level of 
confidence. (The 240 pairs used for the experimental test were only a portion 
of those developed for possible use in the final ROTC self-description blank. 
The final key gave a cross validity of .42.) 

The empirical findings confirm the initial hypothesis. A forced choice 
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procedure explicitly designed to reduce distortion as a source of variance 
proved to be superior to conventional forced choice procedures based on 


mean preference ratings. 

It should be stressed that many subsidiary hypotheses or assumptions 
made in developing the rationale and the procedures were not directly tested 
and cannot be regarded as proved. The findings do give them some support, 
however, in that the general procedure developed from these assumptions 
proved to be of value. 

We might note in particular that the source of variance assumed to be 
distortion was not identified by the empirical analysis. It is pertinent to note, 
consequently, that examination of individual items having high distortion 
indexes as opposed to those with low distortion indexes indicated clearly that 
the general social desirability of the traits or behaviors described together 
with their pertinence to leadership played a major role in determining the 
distortion indexes of the items. Additional evidence, more quantitative in 
nature, would be desirable to define more exactly the nature and role of 
distortion or faking and to determine whether further modifications in pro- 
cedure would be profitable. 
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SOME NECESSARY CONDITIONS FOR COMMON-FACTOR 
ANALYSIS* 


Louis GuTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


Let R be any correlation matrix of order n, with unity as each main 
diagonal element. Common-factor analysis, in the Spearman-Thurstone sense, 
seeks a diagonal matrix U2 such that G = R — U? is Gramian and of minimum 
rank r. Let s; be the number of latent roots of R which are greater than or 
equal to unity. Then it is proved here that r = s; . Two further lower bounds 
to r are also established that are better than s; . Simple computing procedures 
are shown for all three lower bounds that avoid any calculations of latent 
roots. It is proved further that there are many cases where the rank of all 
diagonal-free submatrices in F is small, but the minimum rank r for a Gramian 
G is nevertheless very large compared with n. Heuristic criteria are given for 
testing the hypothesis that a finite r exists for the infinite universe of content 
from which the sample of n observed variables is selected; in many cases, 
a type of multiple common-factor structure cannot 

0 


1. The Problem. One of the fundamental problems of common-factor 
analysis—in the sense of Spearman, Thurstone, and others—is as follows. 
Given the Gramian matrix R of the intercorrelations among n observed 
variables, with each main diagonal element equal to unity. Let U*® be an 
arbitrary diagonal matrix, with the jth main diagonal element denoted by 
u; , subject to the restrictions that: 


O<uisl (j = 1, 2, --- ,n). (1) 
Let G be the symmetric matrix defined by 
G=R-—U’. (2) 


Find a U’ which will leave G Gramian but with the smallest possible rank. 

An analogous algebraic problem holds for a certain type of latent struc- 
ture hypothesis for dichotomies (9). The results of the present paper hold, 
with only a change in vocabulary, for obtaining lower bounds to the number 
of latent classes possible according to a given matrix of joint frequencies. 
For brevity, we shall refer here only to the common-factor problem. 

The importance of this problem lies in the fact that the minimum rank 
for a Gramian G in (2) equals the minimum number of common-factors in a 
system that can account exactly for the non-diagonal intercorrelations in 


R (5). 


*This research was made possible in part by an uncommitted grant-in-aid from 
the Behavioral Sciences Division of the Ford Foundation. 
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That the elements of a U” satisfy restrictions (1) is no guarantee that 
the corresponding G in (2) is Gramian. It is an essential part of the problem 
that G be restricted to being Gramian. Non-Gramian G’s can often be found 
with smaller rank than any possible Gramian G for the same R. Indeed, we 
shall show an example below where a non-Gramian G of rank unity can be 
found for a certain R, while the lowest possible rank for a Gramian G for the 
same R isn — l. 

2. The Unknown Communalities and Uniquenesses. The jth diagonal 
element of a Gramian G in (2) is called a “communality” of the jth observed 
variable, and is denoted by h; . From (2) we have—considering the respective 
main diagonal elements of G and R— 


nh=1-u5 (7 = 1,2, +++ ,#). (3) 


From (1)—which is actually a consequence of the restriction of G to being 
Gramian—it must be that 0 < Aj S 1. 

The quantity u; is called a “uniqueness” of the jth observed variable, 
when G is Gramian. 

Conventional empirical techniques for attempting to find a Gramian 
G of minimum rank usually proceed as follows. A trial matrix U” is first 
used to define a G as in (2), and one or more common-factors is “extracted””— 
usually by modifying the trial values of U’ in the course of the computations— 
until a matrix is built up which differs from R in the non-diagonal elements 
only by “small” residuals. 

3. Which Null Hypotheses Should be Tested? A common tendency in 
practice is to try to stop the factor “extractions” as quickly as possible. 
This implies that a series of null hypotheses is being tested. The first hypoth- 
esis is that the minimum rank for a Gramian G is unity. If this hypothesis 
is rejected, the hext one tested is that the minimum rank is two, etc. 

It may be argued that the sequence of hypotheses should be reversed. 
In general, an arbitrary set of n intercorrelated variables may have many 
factors in common. A small number of common-factors may be the exception, 
rather than the rule. From this point of view, the following should be the 
sequence of null hypotheses to be tested. First, that the minimum rank is n. 
This can always be rejected, as is well known (and as we shall see again 
below); a Gramian G of rank n — 1 can always be found for any non-singular 
R, usually in many different ways. The next hypothesis then is that the 
minimum rank is n — 2, which may or may not be true; we shall show ex- 
amples where this is not true, the minimum rank being n — 1. 

The new theory of order-factors, or the radex, shows vividly the danger 
of stopping at too few common-factors. For example, if R is a simplex matrix, 
then three or four “common-factors’”—as computed by conventional tech- 
niques—will often leave non-diagonal residuals that will be regarded as 
“small” by any existing criterion. But stopping at this small number of 
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common-factors will obscure the actual mathematical and psychological 
structure of the data (6, 7). [For example, in the case of a perfect simplex, 
if r;, is the typical element of R, then r;, = a;/a, (7 S k), where a; is a certain 
parameter belonging to the jth observed variable. The first four principal 
components can be shown often to account for about 90% of the total variance 
of such a set of observations, and each remaining principal component 
accounting for but a small fraction of the remainder. The parsimony of the 
perfect simplex lies not in the number of common-factors implied, but in 
that but one parameter per test is required to reproduce the observed correla- 
tion coefficients. ] 

In the present paper, we shall present partial safeguards against stopping 
factor “extractions” too soon. 

For finite n, we shall establish some lower bounds to the minimum rank 
possible for a Gramian G, making no assumptions whatsoever. To ascertain 
the minimum rank exactly seems to require solving the communality problem, 
and this has not yet been shown to have a general solution beyond trial- 
and-error (systematic trials will always yield a solution in any given case 
for finite n). 

Some authors have modified the communality problem to be: find a 
U? for (2) such that G will be Gramian, and the rank of @ shall be equal 
to the order of the largest non-singular submatrix in R that does not involve 
a main diagonal element. We shall prove that such a U’ cannot exist for 
many cases, or the revised problem can have no general solution. [This 
modified problem has been suggested by Thurstone (12), and exact solutions 
have been proposed by Albert (1, 2) and Rosner (10). Unfortunately, these 
suggested solutions do not maintain the essential restriction that G be Gram- 
ian.] 

For the infinite universe which the finite sample of n variables represents, 
we shall present heuristic criteria as to whether a finite common-factor 
structure exists at all. The fact that a finite number of common-factors can 
be found to fit a finite number of variables can be a mathematical artifact 
due purely to the finiteness of the sample of variables. Inference about the 
universe is the basic scientific problem for common-factor theory. [This 
point was raised before in the different context of image analysis (8), where 
a different type of heuristic criteria was proposed. The two types of criteria 
are supplementary to each other.] 

In the present paper, we do not treat the problem of ordinary sampling 
error, that is, of sampling a population of respondents. We assume throughout 
that population parameters are used, and not sample statistics. The only 
sampling problem we shall discuss is that of selecting variables from the 
universe of content, which cannot usually be treated by the ordinary theory 
of random sampling. 

4. Definition of the Non-negative Index of a Real Symmetric Matrix; 
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Sylvester’s “Law of Inertia’. Let S be any real symmetric matrix of order 
nm and rank t. Then it is well known that S is congruent to many different 
real diagonal matrices. That is, there are many real non-singular matrices 
P such that, if P’ is the transpose of P, then 


PSP’ = D (4) 


where D is some real diagonal matrix. 

Let p and q be the number of positive and of negative diagonal elements 
respectively of D. Then p + q = 1, for the rank of D must be that of S since 
P and P’ are non-singular; and n — ¢ is the number of zero diagonal elements 
in D. Thus, we have a frequency distribution of the signs of the main diagonal 
elements of D, and: 


ptqatm—bH)=n. (5) 


Sylvester’s “law of inertia” (cf. 3), states that each of the three frequencies 
on the left of (5) is invariant under congruence transformations of S. That is, 
if P in (4) is replaced by any real non-singular P* that results in D being 
replaced by any other diagonal matrix D*, then the frequencies of positive, 
negative, and zero main diagonal elements of D* will remain p, q, and n — ¢ 
respectively. Sylvester’s theorem is basic to the theory and practice of the 
present paper. 

The frequency p is usually called the index of S. We shall find it more 
convenient to call p the positive index of S, to distinguish it from another 
index we wish to use as well. Let s be the number of non-negative diagonal 
elements of a D in (4), or: 


s=pt+(n—t =n-—q. (6) 


Then we shall call s the non-negative index of S; it is clearly also invariant 
under congruence transformations. The practical use of our results below 
requires the computing of s for certain real symmetric matrices derived 
from R. 

An important special case of a congruence transformation of S is where 
P is an orthogonal matrix in (4). Then the main diagonal element of D are 
the latent roots of S. This suggests a practical way of computing s. Compute 
the characteristic polynomial of S; this immediately determines ¢. Then p 
and q can be determined merely by inspection of the coefficients of the 
polynomial, using Descartes’ rule of signs. 

Usually the easiest way of computing s may be simply to reduce S to 
diagonal form by successive elementary transformations, and then just 
counting the non-negative diagonal elements in the resulting diagonal matrix. 
For example, let P, be the elementary transformation that subtracts a 
constant times the first row of S from the second to yield zero as the first 
element of the second row, and compute P,SP{. Then let P, be another 
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elementary transformation to induce another zero element below the main 
diagonal of P,SP; , etc. Then if we let P = --- P,P, , we have PSP’ a 
diagonal matrix from which s can be ascertained by inspection. 

The practical results of the present paper are presented in the next 
two sections. The proof underlying these results is given in §8 below, and 
further points of interest are then established in the final two sections. 

5. The Three Lower Bounds. Let r be the (unknown) minimum rank 
possible for a Gramian G in (2) for the given R. Let s, be the number of latent 
roots of R which are greater than or equal to unity. Then we shall show 
that s, is necessarily a lower bound to r: 


r2s,. (6) 


Another way of defining s, is as follows. Let S, be the symmetric matrix 
defined by 


S,=Rk— I, (7) 


where 7 is the unit matrix. That is, S, is obtained from R by replacing each 
main diagonal element of R by zero. Let s, be the non-negative index of S, . 
Then s, is also the number of non-negative latent roots of S, , or is the same 
as the number of roots of R which are not less than unity, namely the s, 
defined in the preceding paragraph. Hence (6) holds also for this definition 
of 8; . 

An advantage of determining s, from S, rather than from R is to avoid 
computing latent roots. Any elementary transformations of S, to diagonal 
form, maintaining congruence, will suffice to determine s, , as noted in the 
preceding section. 

The second lower bound to 7, s2 , is computed as follows. Let 7; be the 
multiple correlation coefficient of the jth observed variable on the remaining 
n — 1 observed variables. Let D, be the diagonal matrix whose jth diagonal 
element is 1 — 1; (j = 1, 2, --- , n), and let S, be the following difference 
matrix: 


S,=R-—D,. (8) 


Let s, be the non-negative index of S, . Then we shall show that s, is neces- 
sarily a lower bound to 7, or: 


r2&. (9) 


The diagonal matrix D, in (8) is easily computed from the inverse 
of R when R is non-singular—which is the general case in practice. The 
jth diagonal element of D, is simply the reciprocal of the corresponding 
element of R™’. The main diagonal elements of S, itself are, of course, the 
r; , in place of the unity elements of R. 

The calculation of D, may be prohibitive for large n. A weaker lower 
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bound than s,—but better than s,—is obtainable without such labor. Let 
7; be the largest zero order correlation (in absolute value) that the jth ob- 
served variable has with any of the n — 1 remaining observed variables; 
that is, 7; is the largest non-diagonal element in the jth row (column) of R. 
Let D; be the diagonal matrix whose jth main diagonal element is 1 — 7; 


(j = 1, 2, --- , mn) and let S,; be the symmetric matrix defined by: 
S,=R—D;. (10) 
Let s; be the non-negative index of S; . Then it is necessarily true that 
r26. (11) 


It will be further shown that, of the three lower bounds, s, is the best 
and s, the weakest, or: 


ree Rass. (12) 


In practice, s; may be the most convenient to use. It is better than s, , 
and is not essentially more difficult to calculate. S; differs from R only by 
replacing the latter’s diagonal elements of unity by the 7; . The 7; are ascer- 
tainable by mere inspection of R. It should be remarked that it is the 7; 
and not the 7; that are in the main diagonal of S; . In Thurstone’s centroid 
method (12), he uses the 7; as estimates of the communalities; they may be 
too high or too low, and in a manner which is not ascertainable in advance. 
Here, we use the 7; because they are necessarily lower bounds to, or always 
underestimates of, the communalities,' 

6. The Frequency Distribution of the Latent Roots of R. Even if r were 
known exactly for a given sample of n variables, this by itself would give 
little information as to what r might be for a larger n from the same universe 
of content, nor what the limit of r might be asn > o~. 

We shall propose here some heuristic considerations concerning the 
limit of r as n — o. These are based on the latent roots of R. Alternatively 
they could be based on the latent roots of S, or of S; (the roots of S, are 
exactly related to those of R, so S, needs no extra treatment). For brevity, 
we shall state the case only for R, and the modifications for S, and S; can 
be seen to follow immediately. 

As is well known, the sum of the latent roots of any matrix is always 
equal to the trace, or the sum of the elements in the main diagonal. Since each 
main diagonal element of # is unity, the sum of its latent roots is always n. 
Therefore, the arithmetic mean of the latent roots of R is always unity, for 
there are always n latent roots. Hence, except for the trivial case where all 
roots of R are equal—or R = J and all variables are uncorrelated—at least 
one root of # is greater than unity and at least one is less than unity. Further- 
more, since & is Gramian, all its roots are non-negative, so they must all 
be in the interval from zero to n. Hence, one or more roots of R is between 
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zero and unity, and all the rest are between unity and n, the arithmetic 
mean of all n roots being unity. 

In general, then, the n latent roots of R have an asymmetric frequency 
distribution about the mean of unity. By recalling the definition above of 
our first lower bound to r, we see that s, is the number of roots of R that are 
greater than or equal to the mean. Conversely n — s, is the number of roots 
less than the mean. Our first lower bound is thus one feature of the frequency 
distribution of the latent roots of R. 

What will happen to the frequency distribution of latent roots as n 
increases? This is a crucial question if we are concerned about the entire 
universe of content. The arithmetic mean, of course, will remain invariant 
at unity. But will s, change? Evidently, no general answer can be given; 
it depends on the nature of the empirical data. However, the following 
heuristic considerations may be proposed. If, for a given n, s, is relatively 
small while the distribution of latent roots is clearly bimodal—with one 
cluster well above unity, another cluster clearly below unity, and no roots 
in the neighborhood of unity—then it may be expected that this bimodality 
will be maintained as n increases, and in particular that s, will not increase 
much, if at all, as n increases. Indeed, if n is not insubstantial but s, is very 
small, it might be hypothesized that this s, was essentially the limiting value 
of rasn > &, 

This heuristic rule, of course, cannot be stated easily in more precise 
terms as to what a “substantial” value for n is, nor what a “relatively small’’ 
s, is, nor how large the empty interval around unity should be for the roots. 
Regardless, it may be of some value in practice. 

7. Inference About an Infinite Universe of Variables. One assumption 
underlying the above heuristic rule is that there may be in practice a certain 
inertia for bimodality of the latent roots as n increases, if such bimodality 
exists at all. A more rigorous consideration is the fact that, if -—the minimum 
rank of a Gramian G for R—remains constant as n increases—then the 
largest latent root of R must in general become infinite as n — . More 
precisely, the following theorem is true. 

Theorem 1. Let the minimum rank r of Gramian G for R be constant for 
all n sufficiently large*. Let s,(n) be the number of latent roots of R not less than 
unity for the indicated value of n. Suppose there exists a fixed non-negative 
number 6 such that 





<1, (13) 


and that n — s,(n) of the latent roots of R are not greater than 6 for all n. Then 
the largest latent root of R must increase without limit asn > ©. 


*One can be more precise by specifying a constant r for all n > 2r, for then the 
communalities that are implied to exist here must be unique, as shown for example in 
(2), or the smallest common-factor space is the same for all n > 2r. 


* 
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The proof of the theorem is simple, when we recognize that s,(n) is a 
lower bound to r as asserted in §5 above (and as will be proved in §8 below). 
Let > 1(n) denote the sum of the s,(m) roots that are not less than unity, 
and let >>.(n) denote the sum of the remaining n — s,(n) roots. Each of 
the latter is not greater than 6, by the hypotheses of the theorem, so 


> (n) < 8[n — s,(n)], (14) 
or, dropping the negative term on the right, 
> (n) S on. (15) 
Now, the sum of all the latent roots of R is always n, or 
De (n) + Dn) =n. (16) 
1 2 
From (15) and (16), we see that 
> (n) = (1 — On. (17) 
1 
Therefore, from (13) and (17), 
lim ? (n) = +2, (18) 


no 1 


or the sum of the s,(m) largest roots of R increases indefinitely as n increases. 
Since only a finite number of terms is involved in the left of (18), at most r, 
it must be that at least one latent root of R must increase indefinitely with 
n, or the theorem is established. 

From (17), we see that the smaller 6 is, the higher the floor to the sum 
of the largest roots. That is, the closer 6 is to zero, the more must the largest 
roots exceed unity on the average. This is a necessary condition for there to 
remain a finite r for R as n increases. 

It should be noted that hypothesis (13) is essential to Theorem 1. If 
6 should equal unity, or if 6 should be a function of n and tend to unity as a 
limit, the theorem need not hold. On the other hand, hypothesis (13) can be 
weakened somewhat by letting 6 be a function of n and only requiring the 
right member of (17) to increase indefinitely with n./ 

8. Proof of the General Formula for Lower Bounds. The three lower bounds 
of §5 are all special cases of a more general formula. The general case will 
now be established. It is based on the following lemma. 

Lemma 1. Let S be any real symmetric matrix, G, a non-singular real 
Gramian matrix, and H their sum: 


H=S8+G,. (19) 
Let h be the positive index of H, and let s be the non-negative index of S. Then 
h2s. (20) 
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For the proof, we first observe that since G, is Gramian and non-singular, 
there exists a non-singular matrix F such that (5): 


G. = FF’. (21) 
Let H, and S, be defined respectively as: 
H, = F"H(F’)", S, = F"S(F’)". (22) 


Then, premultiplying both members of (19) by F~* and postmultiplying by 
(F’)~*, and using (22) and (21), yield 


HA, = 8+ 1. (23) 


From (23), the latent roots of H, are the same as those of S, each increased 
by unity. Therefore, to each positive root of S,) there corresponds a positive 
root of H, . Furthermore, to each root of zero of S, there corresponds a 
positive root—unity—of H, . Therefore, if ho is the number of positive roots 
of H, and if s is the number of non-negative roots of S, , it must be that 


ho 2%. (24) 
The equality in (24) holds if and only if all latent roots of S, are greater 


than —1. For the practical uses we shall make of the Lemma, we shall not 
know in general how many roots of S, are less than or equal to —1, so we 
shall not know in practice when the equality in (24) holds. 

Now, from (22) and Sylvester’s “law of inertia,’ the number of latent 
roots of H, of a given sign is the same as for H. The same kind of invariance 
holds between the signs of the roots of S, and S. Hence, we can write 


ho = h, So = §. (25) 
Then inequality (20) follows from (24) and (25), or the Lemma is established. 


From Lemma 1 we can immediately deduce the following theorem: 
Theorem 2. Let R be any Gramian matrix, and let E by any diagonal 
matrix such that G is also Gramian, where G is defined by: 


G=R-E. (26) 
Let D be any diagonal matrix such that 
d;>e, (j=1,2,-+--,n), (27) 
where d; and e; are the jth diagonal elements of D and E respectively. Let 
S=R-— D. (28) 


Let s be the non-negative index of S, and let r be the rank of G. Then 
r2s. (29) 
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For the proof, we note from (26) and (28) that 
G=S8S+(D- B®). (30) 
Now, from (27), D — E is a diagonal matrix with all main diagonal elements 
positive. Therefore, D — E is a non-singular Gramian matrix. Hence, we 
can identify (30) with (19), by letting G = H, and D — E = G, . Then 
(29) is a special case of (20), since the number of positive roots of G equals 
its rank, G being Gramian; and Theorem 2 is established. 

Our three lower bounds are each special cases of Theorem 2. In each 
case, E = U’. For s, , we set D = I, so (27) certainly holds. Consequently 
(29) holds for s = s, . 

For s. , we set D = D, in (28). That (27) then holds follows from the 
theorem proved elsewhere (4) that r; < hi (j = 1,2, --- , n) for non-singular 
R. Hence (29) holds for s = s, . 

Since a multiple correlation coefficient is never less in absolute value 
than any lower order correlation, it must be that 7; < r; , so that also 
7; < hi (j = 1, 2, --+ , n) by the same theorem referred to in the previous 
paragraph. Hence, setting D = D, in (28) necessarily implies that (29) 
holds for s = s3. 

9. Proof of the Order Among the Lower Bounds. That s, is a better bound 
than s, also follows from Lemma 1. We note that, from (7) and (8), 

S, = 8, + (I — D,). (31) 

Now, 1 — 1; is always positive for all 7 when R is non-singular. Hence J — D, 

is a non-singular Gramian matrix, and can serve as G, in (19). Setting S = S, 
and H = 8, , and remarking that consequently 

8 2h, (32) 


since s, is the number of non-negative roots of S, while h is only the number 
of positive roots of S, , we have from (20) and (32) that 


ee (33) 


Similarly, it can be proved that s; = s, and that s, = s; , by noting 
that J — D; and D, — D, are each Gramian matrices, rewriting (31) appro- 
priately for each case, and using Lemma 1. Thus we have established the 
continued inequality (12) above. 

10. The Special Case of Equal Uniquenesses. In general, only a lower 
bound can be set to the minimum value for r for a Gramian G in (2). A special 
case where the minimum rank can be determined exactly is that of equal 


uniquenesses. Assume that all n uniquenesses have a common value wu’: 
u=u (j = 1,2, +++ ,n). (34) 
Then U” is a scalar matrix, and we can write 
U? = ul. (35) 
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Then (2) can be rewritten as 


G=R-w. (36) 
From (36), if \; is the jth latent root of R, then \; — wu’ is a latent root 
of G (j = 1, 2, --- , n). But a necessary and sufficient condition for G to be 


Gramian is that all its latent roots be non-negative. Hence, for u’ to leave 
G Gramian it must be that 


42u (j = 1,2, +++ ,n). (37) 


The number of zero roots of G will be the number of times the equality holds 
in (37). Hence, the following theorem: 

Theorem 3. For the case where all the uniquenesses are equal, the minimum 
rank for a Gramian G in (2) ts obtained when the smallest latent root of G is 
used for each uniqueness. If r is the minimum rank, then n — r is the multipli- 
city of this smallest root. 

We immediately have the following corollary: 

Corollary. The minimum rank for a Gramian G for any R, as in (2), is 
not greater than n — 1. 

This follows from Theorem 8, since we can always consider a set of unique- 
nesses all equal to the smallest latent root of R, and its multiplicity is at 
least unity. So n — r in the theorem is not less than unity, orr S$ n — 1. 

11. Cases Where r Cannot be Small Compared with n. We have just seen 
how U’ can always be determined to have r S$ n — 1. But common-factor 
theory in the Spearman-Thurstone sense requires far more than this. It 
requires 7 to be very small compared with n, especially as n increases. 

We shall now show many examples wherein a parsimonious common- 
factor structure in the Spearman-Thurstone sense is impossible. 

Consider the case where F has only two distinct latent roots. Say that 
the two distinct values are \, and d, respectively, where \,; > A. . Then, 
since the (weighted) mean of these two values must be unity, we have 


A, > 1, A <1. (38) 


Let p be the multiplicity of \, so that n — p is the multiplicity of A, . From 
the lower bound s, , we have immediately that for any Gramian G derivable 


from R, 
r =p. (39) 


In particular, if p = n — 1, then from (89) and the Corollary to Theorem 3 
above, we have 


r=n-—1 (p =n — 1), (40) 


or r cannot at all be parsimonious compared with n. 
Indeed, this is an example of a ‘‘Heywood” case (11). For when 


at Sh Be 
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p = n — 1, then all tetrads, or second-order minors, in R vanish that do not 
involve the main diagonal. To see this, consider the matrix R — X,/. This 
has p zero roots and n — p negative ones (= A, — A,), and so is a sym- 
metric—but non-Gramian—matrix of rank n — p. Hence, all minors vanish 
that are of order larger than n — p; or in R, all minors of order n — p + 1 
or larger, and that do not involve the main diagonal, must vanish. When 
p =n — 1, we haven — p = 1, or the Heywood case. All tetrads in R outside 
the main diagonal vanish, yet the minimum rank for a Gramian G is n — 1. 
(For a non-Gramian G, in this case, the minimal rank is unity.) 

We have in effect generalized the Heywood case to p S n — 1. We 
have shown that many possible correlation matrices are such that, although 
all diagonal-free minors of a given small order (n — p small) vanish, yet no 
communalities exist that will yield a Gramian G of the same or comparable 
small rank (r = p, p large). Therefore, merely studying the minors outside 
the main diagonal, as suggested by Thurstone (12), is not sufficient for 
determining communalities or the minimum possible rank for a Gramian G. 

Many further examples can be exhibited, with more than two distinct 
latent roots for R, leading to the same conclusion. 

The question as to whether a parsimonious common-factor systema 
exists at all for a given set of data remains a fundamental one to be reexplored 
in each empirical case. Current computing procedures which aim at stopping 
at some relatively small number of common-factors may prejudge the issue. 
In many cases, an order-factor system, such as the radex, may be more 
appropriate than a finite common-factor system (6, 7, 8). 
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NOTE ON “A TABLE FOR THE RAPID DETERMINATION OF THE 
TETRACHORIC CORRELATION COEFFICIENT” 


Me vin D. Daviporr 


U. S. CIVIL SERVICE COMMISSION 


This note revises the figures in the original article concerning the 
accuracy of the tetrachoric estimates involved. These estimates are better 
than previously noted and are very satisfactory. Some minor known errors 
in the original article are also noted. 


Since the publication of ‘‘A Table for the Rapid Determination of the 
Tetrachoric Correlation Coefficient” (2), the accuracy check values in Table 
3 have been recomputed using six terms [the number used in the construction 
of the Chesire, Saffir and Thurstone tables (1)] of the series on the bottom of 
page 117 (as corrected in this note) with theoretical cell frequencies rounded 
to two places. Only two minor changes took place in rz, and Q; values. 
The whole of Table 3 of the original article was then recomputed using six 
terms of the series as above but carrying the theoretical frequency proportions 
to three places. The results in the revised Table 3 have in most instances 
shown ry, equal or very close to r,,,, and Q; closer to r,,, than in the previous 
table. The fluctuation of Q; and ry, values with changes in third-place fre- 
quency proportion figures, regardless of the size of the N on which proportional 
frequencies are based, is indicative of the basically unstable nature of the 
tetrachoric correlation coefficient. In any event, Q; is shown to be very 
good estimate of rie: . 

It should be noted by the unwary in using Table 1 of the original article 
that one need not compute ad/bc using proportions but may employ the raw 
frequencies directly, and that if proportions happen to be handier, there is 
not too often reason in practice to employ proportions to more than two 
places. 

Since the original article was published, a few errors in it have come 
co the authors’ attention. These are as follows: Due to an authors’ error in 
transcription in Table 3, Q; for an 7,., of .8 and (a + b), (a + c) equal to 
.3 and .5 was listed as .84 when it should have been .83. The tenth line on 
page 116 should read “‘--- method or Pearson Q; values.” The fourth term 
of the equation on the bottom of page 117 should have been: 


- 2 2 
5 Wel(h? — 8)(K* — 8)] + ++ 


This error was made by the authors in perseveration of the same error in 
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Peters and Van Voorhis (3). This error was kindly called to the attention of 
the authors by Frederic Lord of the Educational Testing Service. 
TABLE 3 (Revised) 


Revision based upon six terms of series* 
and three decimal places in cell proportions. 








Teet™** “tet? "tet? Tet? Tet? “ter? 


atb atc a b c d 





Tm Q; lth Q3 a Q3 lm Q3 Th Q; th Q 





-057 .143 .143 657 220 «2 
-066 .134 .124 .666 030 234 
= ‘we -087 .113 .113 .687 50 .55 
-112 .088 = .088 .712 70 «73 
128 .072 .072 .728 80 .83 
0145 .055 .055 .745 88 .90 


080 .120 .220 .580 20 222 
e091 .109 .209 .591 -30 = .33 
—- a2 0115 .085 .185 .615 -50 .54 
e141 .059 159.641 -70 .72 
159 2041 .141 659 80 .83 
0178 =.022 .122 .678 90 .92 


0122 078 .378 .£22 520 222 
+134 066 .366 434 +30 .33 
a 35 0157 .043 + .343 0457 50 .56 
0176 «=.024 324 476 667 374 
-192 .00€ .308 .492 83.9] Ht 
+203 -.003 .297 .503 eee dificta: 


e115 .185 .185 .515 20 = .21 
0128 = .172 ) «.172_—Ss«w 528 30 231 
3 «3 0157 2143 2143557 +50 .52 
-190 .110 .110 .590 70 ~.721 
+210 .090 .090 .610 80 .8l 
232.068 =.068 =.632 -88 489 


0178 =4122) 6322S .3'78 -20 .21 
-192 108 .308 .392 30 231 
rb) 2222 ©.078 4278 «422 50 .52 
0254 046 .246 .L54 70 =.73 
e271 =.029 «229 «L771 81 .83 
290 .C1l10 .210 .490 IL 94 


-282 .218 = .218 = .282 -20 .20 

0299 .201 =.201 = =.299 30 = .30 

0333 167 .167 .333 -50 .50 

e319 282? sie? 2273 -70 .70 

0295 .105 .105 .395 80 .79 

+420 .080 .080 .420 88 .88 


w 
. 
uw 





*Series on the bottom of page 117 of the original article by Davidoff and Goheen. The 
series was used as revised in this Note. 


**Note small proportion in cell b. 
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BOOK REVIEWS 


O. K. Buros, (ed.). The Fourth Mental Measurements Yearbook. Highland Park, New 
Jersey: The Gryphon Press, 1953, pp. 1163 + xxiii, $18. 


The Fourth Mental Measurements Yearbook follows the same plan as its predecessor, 
The Third Mental Measurements Yearbook, and, like it, is intended to supplement, rather 
than supplant, previous volumes of the series. The two main sections of the book are the 
Tests and Reviews section, with 830 entries, and the Books and Reviews section, with 429 
entries. Other useful features are a list of contributing reviewers, a periodical directory 
and index, a publishers directory and index, an index of titles, an index of names, and a 
classified index of tests. 

The eight objectives stated by the Editor for the Tests and Reviews section might be 
characterized in terms of three general purposes: 

1. To provide a bibliography of published tests and research which has been done 

on them; 

2. To give test users valuable information about specific tests; 

3. To exert an influence toward improving the quality of tests. 

The above ordering seems to this reviewer to be that of the extent to which the objectives 
have been attained. As stated in the preface: ‘“The yearbook attempts to list all commercially 
available tests—educational, psychological, and vocational—published as separates in 
English-speaking countries in the four-year period 1948-1951. The commercially available 
tests also include older tests selected for review and tests published during the nineteen- 
year period (1933-1951) covered by this series of yearbooks and bibliographies but not 
previously listed.’’ A new feature of this volume is the listing (but only rarely the reviewing) 
of tests which are available only through certain restricted sources, such as: Association of 
American Medical Colleges, College Entrance Examination Board, Educational Testing 
Service, Life Insurance Agency Management Association, National League of Nursing 
Education, and Psychological Corporation. 

The bibliography of publications related to the various tests contains a total of 4,417 
titles and the attempt has been made to include all references published and unpublished 
“on the construction, validity, use, and limitations of each test .. .”’ through 1951. This 
bibliography certainly constitutes an invaluable source of information for anyone planning 
extensive work with any test on which research findings are available. 

For each test listed there is given, in addition to title, author, and publisher, a de- 
scription of the groups for which the test is intended; copyright or publication date; what 
part scores, if any, are obtained from it; whether the test is an individual or a group test; 
whether it is machine scorable; cost (as of early 1952); and working time and total time 
required. If data on reliability and validity are absent from the manual, this fact is mentioned 
also. 

The distribution of the entries in the Tests and Reviews section over the various 
content fields, and the incidence of reviews for them, are shown in Table 1. Inasmuch as 
there are 277 items which are not reviewed either in the Fourth Yearbook or in previous books 
in the series, it seems pertinent to make the suggestion that information about how the test 
was standardized and validated and some quantitative statement of its reliability and 
validity should be included in the descriptive material provided by the Editor. This is 
information which every test user should want to have anyway, and it is in fact omitted 
from some of the reviews themselves. 
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The reviews vary in quality, some of them being thorough and factually-oriented 
discussions of points of importance to test users, while others are principally expressions of 
opinions based on the reviewers’ study of the content of the items of the tests. In some 
cases, usually tests which have been reviewed in previous volumes of the series, only certain 
aspects of the tests are considered. This circumstance points up what seems to this reviewer 
to be a serious obstacle in the use of the book: For many tests, evaluation by a prospective 
user requires reference to two, or even more, of the volumes in the series. It is to be hoped 
that in subsequent Yearbooks, Buros may find it possible to consolidate all the reviews 
which have currency, so that reference to but one source is necessary. 

Despite the variability in quality just referred to, the reviews are on the whole service- 
able and in many instances furnish the kind of information relative to applications and 
limitations of the tests which may go far toward preventing their misuse. Quite a number 
of reviews are so adversely critical (and justifiably so) that it is hard to believe that the 
tests in question will receive enough support to justify their publication, if the Mental 
Measurements Yearbooks have any influence at all. 

The second main section of the Yearbook, Books and Reviews, includes 429 titles, 141 
of which are reviewed. To quote from the preface: “‘An attempt has been made to list all 
measurements books published in English-speaking countries in the four-year period 
1948-1951. In addition, a few older books are listed when accompanied by review excerpts 
not previously published in this series. . . . Books on statistical methods in education and 
psychology published in the eleven-year period 1941-1951 are included but without ac- 
companying reviews. Instead, cross references are given to reviews in Statistical Methodology 
Reviews, 1941-1950.” 

Buros is quite frank in expressing his concern over the prospect that it may not be 
possible to publish subsequent Yearbooks unless the sale of the Fourth makes the venture 
a good financial risk. It is difficult to see how the market can be greatly expanded if each 
new volume requires for its full utilization the availability of previous Yearbooks in the 
series. Perhaps the solution to the problem may be in the adoption of some kind of loose- 
leaf format, so that additional reviews of a test could be inserted next to the original ones 
for that test. Such an arrangement would also permit the establishment of a continuing 
review service, so that critical reviews of tests could be made available as the tests them- 
selves appeared or as new findings about their uses were brought out. It would indeed be 
regrettable if this valuable service which Buros is performing should have to be discontinued. 


University of Michigan John E. Milholland 


Hans Reicuensacn. The Rise of Scientific Philosophy. Berkeley and Los Angeles: University 
of California Press, 1951, pp. xi. + 333. 


Those who have even a slight acquaintance with contemporary academic philosophy 
are aware of the movement or school which is variously called Logical Positivism or Logical 
Empiricism. It had its beginnings in Vienna and Berlin during the middle twenties, and 
its influence spread quickly. Largely as a result of European political changes during the 
past two decades, most of the surviving leaders of its early period are settled now in the 
United States. Hans Reichenbach, author of The Rise of Scientific Philosophy, was one of 
the earliest members of the Logical Empiricist group in Berlin, and is now Professor of 
Philosophy in the University of California at Los Angeles. His book is the first to present 
a popular exposition of the new philosophy, which is said by the author to be itself a science. 
The Preface asserts that, “ ... this book is written with the intention of showing that 
philosophy has proceeded from speculation to science.’’ (p. vii). 
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Reichenbach’s book is divided into two parts, the first, occupying one-third of the 
book, is headed ‘The Roots of Speculative Philosophy,” and takes up the first six chapters. 
The second part consists of twelve chapters and occupies two-thirds of the book, being 
headed ‘‘The Results of Scientific Philosophy.”’ The first part is devoted to an extended 
condemnation of the vast bulk of traditional philosophy. Traditional philosophies are 
divided by Reichenbach into just two camps, which he calls rationalism and empiricism. 
The first is condemned for its belief that factual knowledge can be obtained from sources 
other than sense perception. The second is criticized somewhat more mildly for its failure 
to agree that certainty is not a criterion for knowledge, and that so-called certain knowledge, 
like pure mathematics, for example, is not factual. Their belief in the possibility of achieving 
a priori knowledge which is factual or synthetic led rationalist philosophers, we are told, to 
the construction of theories which provide only pseudo-explanations for the problems with 
which they were concerned. And the failure of older empirical philosophers to recognize 
that probability rather than certainty is all that knowledge need possess is what led them 
into the wasteland of barren scepticism, like Hume. Reichenbach’s criticism is only partly 
devoted to refutation, however. For the most part he is interested more in the psychological 
basis of the older philosophies than in their logical shortcomings. Here, as in Dewey, the 
“quest for certainty” is the villain. 

The second part of the book is more constructive, although in every case the author 
repeats his claim that progress could be made only after the synthetic a priori was abandoned 
and the demand for certainty was relinquished in favor of probability. Here Reichenbach 
discusses such varied topics as the nature of geometry, the philosophy of time, causality, 
the atomic theory, evolution, and what he calls ‘‘the functional conception of knowledge.” 
His treatment of many of these topics is illuminating and very readable, much in the 
tradition of Eddington and Gamow. Reichenbach’s discussion of geometry covers familiar 
ground, for the most part, describing briefly how earlier empirical rules of thumb were 
codified into a deductive system by Euclid, and then describing modern developments at 
somewhat greater length. The modern view, developed subsequent to the introduction of 
non-euclidean geometries, is that one axiomatic system is as ‘‘good’’ as another, regarded 
merely as mathematics, but that the question of the nature of real or physical space is an 
empirical question for the physicist to determine by observation or experiment. Some 
stimulating remarks are made in criticism of the ‘‘conventionalist’’ doctrine of Poincare, 
but the argument is not carried far enough to be completely convincing. 

Included also is a longish chapter on ethics, in which Reichenbach accepts without 
criticism the view of C. L. Stevenson that so-called moral judgments are neither true nor 
false, being what he calls “volitional decisions” rather than propositions or assertions proper. 

An interesting feature of Reichenbach’s book is the parallel treatment he gives to the 
history of philosophy and the history of science. The last chapter of Part One is on classical 
physics, and several chapters in Part Two are more concerned with the development of 
modern science itself than with philosophy. The account is valuable in showing the influence 
of scientific progress on philosophizing—the “old’’ as well as the “new” philosophy was 
subject to this influence. Of course the parallel treatment is not merely an expository 
device, but an essential part of Reichenbach’s argument to the effect that the significant 
questions considered by the older philosophers were of such a nature that they could only 
be answered by science itself, not by philosophy. 

What then is the proper task and the appropriate method of philosophy? According 


to Reichenbach, “It is the clarification of meanings through logical analysis...” (p. 145), 
that is, ‘Philosophy is logical analysis . . .” (p. 308). Since it is Reichenbach’s view that 
“Logic formulates rules of language . . .” (p. 222) logic and philosophy seem to have much 


in common with, perhaps even reduce to lexicography and grammar, which are the tra- 
ditional disciplines dealing with the correct use of words and their meanings. But far more 
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interesting conclusions are drawn by Reichenbach than by any grammarian. For example, 
“The question of the existence of the mind is a matter of the correct use of words but not a 
question of facts.” (p. 272). And again, ‘“There is the question of the existence of the ex- 
ternal world . . ., which is found to be a question of cerrect use of language . . .”’ (p. 307). 
These quotations should serve to indicate that Reichenbach has in mind some criterion 
for ‘‘the correct use of words” other than the facts, including, presumably, the facts of 
language usage, to which mere lexicographers and grammarians are bound. Not even a 
hint is given, however, of whgt these criteria consist. 

The author’s conception of analysis is not altogether clear. When a scientist speaks 
of analysing his data, what he usually has in mind is the formulation of some hypothesis or 
theory which will account for or explain the observed facts. In this sense of analysis, how- 
ever, it is the traditional method of philosophizing used by the great philosophers from 
Plato through Whitehead. The classical philosophers, of course, ‘“‘taking all knowledge for 
their province,” sought to formulate very general theories to account for data not lying 
wholly within the scope of any of the special sciences. Since Reichenbach wishes to dis- 
tinguish sharply between his own and the traditional conception of philosophy, he must 
have some other sense of ‘analysis’ in mind. But the reviewer is unable to discover what 
this: new sense of “analysis” might be. 

Professional psychologists who read this book will be somewhat taken aback to be 
told that ‘... the human mind . . . is essentially passive in the act of perception.” (p. 66). 
And they will perhaps be puzzled as to how to go about fulfilling the new tasks assigned 
them by Reichenbach, for example “ ... predetermination through fate is a conception to 
be explained by psychology . . .” (p. 105). Some of Reichenbach’s ‘demonstrations’ strike 
this reviewer as being somewhat less than convincing. Consider, for example, the following: 
“The mind is inseparable from a certain state of bodily organization. It follows that mind 
and bodily organization of a certain kind are the same thing.”’ (pp. 271-2). This argument 
introduces a new principle, which we might dub “the identity of indivisibles.”” We might 
as well argue that since Damon and Pythias are inseparable, they are therefore “the same 
thing,” which does not seem to be very Jogical, and yet follows the same pattern as Reichen- 
bach’s own argument. 

It should be remarked, moreover, that many doctrines which Reichenbach presents as 
though they were universally accepted results of positivistic philosophy would find few 
adherents even from the ranks of other Logical Empiricists. His remarks on probability 
theory, for example, have been presented and argued for by him in numerous publications 
in the past, and yet they seem to have won but little acceptance. The situation is similar 
with respect to his claims for the utility of many-valued logics in quantum theory. 

It is regrettable that Reichenbach felt obliged to refute all previous philosophers 
before presenting his own views, because some of his refutations seem to fall far wide of 
their marks. Reichenbach suggests that had non-euclidean geometries been discovered 
earlier, ‘‘Plato’s doctrine of ideas would have been abandoned as lacking its basis in geo- 
metrical knowledge.” (p. 142). Yet Platonic or mathematical realism is strongly defended 
by such mathematicians and logicians as Frege and Godel, who presumably were at least 
as well acquainted with non-euclidean geometries as Reichenbach himself. And to say 
that the philosopher Berkeley was a solipsist is as much a historical error as it is a philo- 
sophical one to say that he refuted his own solipsism by writing books; but Reichenbach 
says both these things (p. 267). 

Yet despite its shortcomings in matters of detail, Reichenbach’s book is a provocative 
and forceful argument for a very important contemporary philosophical position. 


University of Michigan Irving M. Copi 
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W. Ross Asupy. Design for a brain. New York: Wiley & Sons, 1952, pp. ix + 260. 


The analogy is a device greatly admired by scientists. At least there never seems to 
be a dearth of them. Sometimes the analogy provides momentary interest, but quickly 
fades out of popularity because of a lack of any real contribution to the advancement of 
science. At other times the analogy becomes so useful] as to assume the role of a precise 
theory or even a factual description of the area of knowledge analogized. 

The analogy has many possible uses in science. Sometjmes it serves only to clarify 
thinking about a problem by stating it in terms more familiar to the scientist. At other 
times it is used to provide a more exact description of the phenomena. (Such analogies 
usually are somewhat more mathematical than those used simply to put a problem in a 
familiar setting.) At still other times, the analogy provides a means of extrapolating beyond 
the existing data, and of predicting the behavior of phenomena not already known. Probably 
the authors of all analogies hope that this latter benefit will result. Time alone can tell 
whether the predictions are correct. 

Ashby has provided psychologists and physiologists with an analogy which attempts 
primarily to make more understandable some of the known phenomena of the behavior of 
living organisms. His analogy is basically mathematical, and thus stands the chance of 
providing a more exact description of behavior even if predictions made are not borne out by 
experimentation. It is not just mathematical, however, for one of Ashby’s goals is to show 
that complex behavior can be explained with purely mechanistic principles. Throughout 
the book he uses a physical device (the homeostat) to illustrate how various types of behavior 
exhibited by living organisms can be shown to exist in an inanimate object. 

More specifically, Ashby is concerned with showing that adaptive behavior can be 
explained with purely mechanistic principles. Since adaptive behavior covers a lot of 
ground, however, and has often been described as the major distinguishing characteristic 
of animate objects, Ashby’s concern is really quite broad. 

It is impossible to do full justice to the system described by Ashby in a short review. 
Its essential characteristics as seen by the reviewer are as follows, however: 

A system is any arbitrarily selected set of variables, and a variable is any measurable 
quantity which has a value at any instant. The state of the system is simply the numerical 
values of the variables in the system at a given instant. A line of behavior is the relation 
between variables. A phase-space shows the relation between two or more variables, neither 
of which is time. It thus shows the interrelations of these variables, regardless of time. 
For example, hunger of an animal and speed of running down a straight runway might 
both be functions of time, and plots of each against time would be lines of behavior. A 
plot of hunger against speed of running, however, would constitute the phase-space. The 
field is the phase-space containing all lines of behavior from all possible initial states or 
starting points. Thus there can be several curves showing the interrelation between hunger 
and speed of running, if these curves are all started with different initial states. 

In an absolute system, all lines of behavior following a given state are identical, regard- 
Jess of how the system got to that particular state. Thus in an absolute system, two lines 
of behavior might have started from different states, and eventually ended up in the same 
state. If this happens, the two lines of behavior are identical once they have been in that 
given state. This definition of the absolute system simply makes explicit the assumption of 
complete determinism of the behavior of the organism; for with complete determinism if 
we know the state at a given instant we can completely predict the future course of action. 
If the system is not absolute, then we can have a state followed by two or more different 
lines of behavior, with its consequent lack of complete determinism. 

The lines of behavior, or the system, can either be stable or unstable. They are stable 
if they never leave a region of the phase-space. In this connection systems with feedback 
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become important, for such systems usually produce stable lines of behavior. Actually, it 
is only true that systems with negative feedback produce stable lines of behavior. With 
positive feedback the system becomes very unstable. In a system with negative feedback, 
all variables are interconnected in such a way that a change in one will produce a change in 
the others. Thus if the system is in a stable state, if one variable is changed, another variable 
is affected in such a way that it quickly brings the value of the first variable back to its 
stable-state value. 

Ashby postulates that adaptive behavior leads to physiological stability, or keeps all 
relevant variables within physiological limits. Now the question becomes: What kind of 
mechanism will ensure that stable systems will result when the conditions to which the 
system are subjected are as many and varied as those encountered by animate organisms? 
The principle of ultrastability is used to take care of this problem. An ultrastable system 
is one which has the possibilities of stable lines of behavior, and which furthermore seeks 
those fields which have stable lines of behavior. Such a system can be produced if a step- 
function variable is introduced which interacts with the main variables of the field. The 
essential purpose of the step-function is that it provides a constantly changing parameter 
which determines the phase-space of the main variables. If the step-function interacts with 
the main variables in such a way that if the stable lines of behavior do not result, the step- 
function changes value, then the system will continue to change fields until a stable system 
results. (Actually, the step-function can be thought of as simply another variable in the 
field, which now has one more dimension. Furthermore, such a variable does not really have 
to be a step-function. We could simple say that only one field exists, that this field has a 
stable region, and that all lines of behavior converge to this stable region. However, it 
probably does make it easier to think of one of these variables as a step-function which acts 
as a parameter to change the field.) 

Such a system will explain adaptive behavior at an elementary level. The problem 
becomes more difficult, however, when we take into consideration the fact that if a stable 
state is reached, and then the environment is changed and another stable state is reached, 
the first stable state can be shown to still be in existence. To take care of such problems, the 
system is made more complicated, and a multistable system is used. The multistable 
system is composed of many smaller ultrastable systems which are connected together by 
main variables. In such a system, one sub-system could reach a stable level, and this fact 
would then allow another sub-system to reach its stable level without changing the stable 
field of the first system. Obviously this sort of thing could be built up indefinitely to take 
care of problems of greater and greater complexity. 

In very elementary form, these are the ideas that Ashby presents. They are contained 
in his first 18 chapters. An appendix contains six more chapters which treat the problem 
in more rigorous mathematical form. 

It is extremely difficult to evaluate such a proposed system. Certainly there are some 
very interesting concepts here. On the other hand there are some severe deficiencies which 
will undoubtedly limit the usefulness of the system. For example, this way of thinking does 
not explain the role of reward in learning very well. Punishment is easily handled, because 
the effect of punishment is to throw the ultrastable system out of balance, forcing it to 
seek a new value of the step-function which will produce a stable line of behavior. Reward 
could probably best be handled with a Guthrian bias. Its role would be to prevent the 
operation of variables which might throw the system out of balance, thus making it seek 
values on the step-function which produce unstable fields if a stable one has been found. 
However, there is too much evidence that reward plays a much broader role than this in 
increasing the probability that a particular behavior will occur in a given situation again. 

Probably a more serious problem than this, though, is the difficulty of handling 
problems of increasing complexity. For example, if a single ultrastable system is operating, 





172 PSYCHOMETRIKA 


all learning should be insightful (i.e., sudden). But all learning is not in fact insightful. 
The use of the multistable system partly takes care of such problems, by providing for 
a series of sub-learnings each of which could be insightful in turn. Essentially, however, 
with such a system we would be driven ultimately to a series of step-functions of step- 
functions. It would not take much regression of this sort to produce a completely statistical 
description of adaptive behavior, but by this time the value of the particular system used 
to predict the statistics would be greatly lost because of its cumbersomeness. 

A third difficulty, in some ways not really important, is the fairly complete lack of 
supporting evidence at the neural level. If the system’s purpose is only to describe in an 
analogous way the behavior of the entire organism, this difficulty is of little concern. How- 
ever, the chance that the system describes the way things really are is decreased by this 
lack of evidence. 

All in all, the reviewer is fairly pessimistic that Design for a Brain will have greatly 
advanced psychology in either a few years or many years. Systems such as this simply do 
not provide a framework in which to put the known facts of complex adaptive behavior. 
In many ways one has the feeling that the mathematical model simply describes what we 
already know to be true, but that the description holds only if we confine ourselves to 
relatively simple examples. It does not really explain anything. Perhaps this feeling on the 
part of the reviewer is due to the fact that he already accepts the hypothesis that adaptive 
behavior is deterministic, and thus needs no proof that it can be, particularly when the 
proof is for behavior much simpler than that already assumed to be mechanistic. It seems 
as unnecessary to demonstrate that behavior commonly observed in animate objects can 
occur in inanimate objects as it does to demonstate the converse. Probably psychology will 
advance faster by a concerted effort to determine the lawful relations among the variables 
we find. Unless we already assume that they are lawful, there is little point in searching for 
them. So what have we gained by demonstrating that they can be lawful—in the de- 
terministic sense? 

Regardless of this feeling that this analogy (and other similar analogies) do not really 
advance the science of psychology, there is much to be gained from a reading of the book. 
Ashby has been as exact as possible in his developments, and at several places has made 
quite explicit assumptions which are implicitly made by scientists in their work. His chapter 
on dynamic systems is very good reading, and provides a set of definitions which would 
be profitable reading for most psychologists. Also, his chapter on the animal as a machine 
is good for his clear statement of a way of thinking about organisms in environments. 

Perhaps the best way of summarizing the value of such a book is to say that as a 
particular system which explains how adaptive behavior comes about, it is probably not 
going to be of lasting interest. If, however, we want a very clear statement of what we 
mean when we say that adaptive behavior is deterministic or mechanistic, then this book 
provides a very clear and exact statement of that meaning. Perhaps this was the author’s 
real intent. If so, then he has succeeded. If his intent was really to do more than that, he 
probably has not. 


Johns Hopkins University W. R. Garner 
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